This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Support vector machines (SVMs) have been widely used in multimedia retrieval to learn a concept in order to find the best matches. In such a SVM active learning environment, the ...
Gene network reconstruction is a multidisciplinary research area involving data mining, machine learning, statistics, ontologies and others. Reconstructed gene network allows us t...
The paper is concerned with applying learning to rank to document retrieval. Ranking SVM is a typical method of learning to rank. We point out that there are two factors one must ...
Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Hua...