The k-means algorithm is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice. Most of the theoretical work is restricted to the c...
Text clustering is most commonly treated as a fully automated task without user supervision. However, we can improve clustering performance using supervision in the form of pairwi...
This paper presents a comprehensive statistical analysis of workloads collected on data-intensive clusters and Grids. The analysis is conducted at different levels, including Virt...
Instant intercommunion techniques such as Instant Messaging (IM) are widely popularized. Aiming at such kind of large scale masscommunication media, clustering on its text conte...
While null space based linear discriminant analysis (NLDA) obtains a good discriminant performance, the ability easily suffers from an implicit assumption of Gaussian model with sa...