A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a lowdimensio...
The blogosphere has unique structural and temporal properties since blogs are typically used as communication media among human individuals. In this paper, we propose a novel tech...
Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarel...
Andrew Fast, Lisa Friedland, Marc Maier, Brian Tay...
How do we find a natural clustering of a real world point set, which contains an unknown number of clusters with different shapes, and which may be contaminated by noise? Most clu...
The problem of assessing the significance of data mining results on high-dimensional 0?1 data sets has been studied extensively in the literature. For problems such as mining freq...
Aristides Gionis, Heikki Mannila, Panayiotis Tsapa...