We investigate the problem of learning document classifiers in a multilingual setting, from collections where labels are only partially available. We address this problem in the ...
Entity information management (EIM) is a nascent IR research area that investigates the information management process about entities instead of documents. It is motivated by the ...
A novel text extraction method from graphical document images is presented in this paper. Graphical document images containing text and graphics components are considered as two-d...
Many social Web sites allow users to annotate the content with descriptive metadata, such as tags, and more recently to organize content hierarchically. These types of structured ...
Anon Plangprasopchok, Kristina Lerman, Lise Getoor
Selecting and presenting content culled from multiple heterogeneous and physically distributed sources is a challenging task. The exponential growth of the web data in modern time...