Data is often collected over a distributed network, but in many cases, is so voluminous that it is impractical and undesirable to collect it in a central location. Instead, we mus...
Most clustering algorithms operate by optimizing (either implicitly or explicitly) a single measure of cluster solution quality. Such methods may perform well on some data sets bu...
This paper presents a new clustering algorithm called DSCBC which is designed to automatically discover word senses for polysemous words. DSCBC is an extension of CBC Clustering [...
Noriko Tomuro, Steven L. Lytinen, Kyoko Kanzaki, H...
Replication in grid file systems can significantly improves I/O performance of data-intensive applications. However, most of existing replication techniques apply to individual ...
The present paper analyzes the usefulness of the normalized compression distance for the problem to cluster the hemagglutinin (HA) sequences of influenza virus data for the HA gene...