Abstract. Clustering high dimensional data with sparse features is challenging because pairwise distances between data items are not informative in high dimensional space. To addre...
A data set can be clustered in many ways depending on the clustering algorithm employed, parameter settings used and other factors. Can multiple clusterings be combined so that th...
Alexander P. Topchy, Anil K. Jain, William F. Punc...
The NITE XML Toolkit (NXT) provides library support for working with multimodal language corpora. We describe work in progress to explore its potential for the AMI project by appl...
The main problems in text classification are lack of labeled data, as well as the cost of labeling the unlabeled data. We address these problems by exploring co-training - an algo...
We propose the framework of mutual information kernels for learning covariance kernels, as used in Support Vector machines and Gaussian process classifiers, from unlabeled task da...