In the k-nearest neighbor (KNN) classifier, nearest neighbors involve only labeled data. That makes it inappropriate for the data set that includes very few labeled data. In this ...
Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...
Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...
The transition of search engine users’ intents has been studied for a long time. The knowledge of intent transition, once discovered, can yield a better understanding of how di...
This paper investigates the application of randomized algorithms for large scale SVM learning. The key contribution of the paper is to show that, by using ideas random projections...
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. ...
Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom...