Large 0-1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which in...
Existing web usage mining techniques focus only on discovering knowledge based on the statistical measures obtained from the static characteristics of web usage data. They do not ...
The output of a data mining algorithm is only as good as its inputs, and individuals are often unwilling to provide accurate data about sensitive topics such as medical history an...
One fundamental task in near-neighbor search as well as other similarity matching efforts is to find a distance function that can efficiently quantify the similarity between two o...
Since clustering is unsupervised and highly explorative, clustering validation (i.e. assessing the quality of clustering solutions) has been an important and long standing researc...