Skewed distributions appear very often in practice. Unfortunately, the traditional Zipf distribution often fails to model them well. In this paper, we propose a new probability di...
Query optimization in data integration requires source coverage and overlap statistics. Gathering and storing the required statistics presents many challenges, not the least of wh...
Software archives contain historical information about the development process of a software system. Using data mining techniques rules can be extracted from these archives. In th...
As the amount of online text increases, the demand for text classification to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of...
Chowdhury Mofizur Rahman, Ferdous Ahmed Sohel, Par...
Privacy-preserving data mining (PPDM) is an important topic to both industry and academia. In general there are two approaches to tackling PPDM, one is statistics-based and the oth...
Patrick Sharkey, Hongwei Tian, Weining Zhang, Shou...