Spectral clustering refers to a flexible class of clustering procedures that can produce high-quality clusterings on small data sets but which has limited applicability to large-s...
In this paper, we introduce a simple, randomized dynamic data structure for storing multidimensional point sets, called a quadtreap. This data structure is a randomized, balanced ...
Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not fe...
: Nonnegative matrix approximation (NNMA) is a popular matrix decomposition technique that has proven to be useful across a diverse variety of fields with applications ranging from...
Extractors and taggers turn unstructured text into entityrelation (ER) graphs where nodes are entities (email, paper, person, conference, company) and edges are relations (wrote, ...