We propose a weakly-supervised approach for extracting class attributes from structured text available within Web documents. The overall precision of the extracted attributes is a...
: This paper presents a comprehensive overview of the TopX search engine, an extensive framework for unified indexing and querying large collections of unstructured, semistructured...
Latent Semantic Indexing (LSI) has been shown to be effective in recovering from synonymy and polysemy in text retrieval applications. However, since LSI ignores class labels of t...
Sutanu Chakraborti, Rahman Mukras, Robert Lothian,...
Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can p...
Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyv...
Abstract. Latent Semantic Indexing(LSI) has been proved to be effective to capture the semantic structure of document collections. It is widely used in content-based text retrieval...