Statistical machine learning techniques for data classification usually assume that all entities are i.i.d. (independent and identically distributed). However, real-world entities...
In automated text categorization, given a small number of labeled documents, it is very challenging, if not impossible, to build a reliable classifier that is able to achieve high...
Zenglin Xu, Rong Jin, Kaizhu Huang, Michael R. Lyu...
In this paper, we propose a new method of citation data clustering for author name disambiguation. Most citation data appearing in the reference section of scientific papers incl...
Information retrieval systems (IRSs) usually suffer from a low ability to recognize a same idea that is expressed in different forms. A way of improving these systems is to take ...
Fabienne Moreau, Vincent Claveau, Pascale Sé...
Presentation of search results in Web-based information retrieval (IR) systems has been dominated by a textual form of information such as the title, snippet, URL, and/or file type...