Many applications on blog search and mining often meet the challenge of handling huge volume of blog data, in which one single blog could contain hundreds or even thousands of ent...
Jinfeng Zhuang, Steven C. H. Hoi, Aixin Sun, Rong ...
Finding biological entities (such as genes or proteins) that satisfy certain conditions from texts is an important and challenging task in biomedical information retrieval and tex...
Web Page segmentation is a crucial step for many applications in Information Retrieval, such as text classification, de-duplication and full-text search. In this paper we describe...
Finding latent patterns in high dimensional data is an important research problem with numerous applications. The most well known approaches for high dimensional data analysis are...
We introduce a multi-stage ensemble framework, ErrorDriven Generalist+Expert or Edge, for improved classification on large-scale text categorization problems. Edge first trains a ...