We compare two algorithms for sorting out-of-core data on a distributed-memory cluster. One algorithm, Csort, is a 3-pass oblivious algorithm. The other, Dsort, makes three passes...
Although clustering under constraints is a current research topic, a hierarchical setting, in which a hierarchy of clusters is the goal, is usually not considered. This paper trie...
The Jaccard/Tanimoto coefficient is an important workload, used in a large variety of problems including drug design fingerprinting, clustering analysis, similarity web searching a...
Vipin Sachdeva, Douglas M. Freimuth, Chris Mueller
We revisit recently proposed algorithms for probabilistic clustering with pair-wise constraints between data points. We evaluate and compare existing techniques in terms of robust...
World Wide Web (WWW) is a vast source of information, the problem of information overload is more acute than ever. Due to noise in WWW, it is becoming hard to find usable informati...