We present similarity-based methods to cluster digital photos by time and image content. This approach is general, unsupervised, and makes minimal assumptions regarding the struct...
Matthew L. Cooper, Jonathan Foote, Andreas Girgens...
The Self-Organizing map (SOM), a powerful method for data mining and cluster extraction, is very useful for processing data of high dimensionality and complexity. Visualization met...
In this paper, we will compare and evaluate the effectiveness of different statistical methods in the task of cross-document coreference resolution. We created entity models for d...
This paper describes a method for linear text segmentation which is twice as accurate and over seven times as fast as the state-of-the-art (Reynar, 1998). Inter-sentence similarit...
We consider the problem of clustering data lying on multiple subspaces of unknown and possibly different dimensions. We show that one can represent the subspaces with a set of pol...