This paper integrates five linguistic resources, including Cilin, a Chinese-English dictionary, ASBC corpus, SemCor, and WordNet, to construct a Chinese-English WordNet. The resul...
We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, and single-pass) and two linguistically motivated text features (noun phrase he...
Vasileios Hatzivassiloglou, Luis Gravano, Ankineed...
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fit...
We define the problem of decomposing human-written summary sentences and propose a novel Hidden Markov Model solution to the problem. Human summarizers often rely on cutting and ...
We examine metadata and data-structure issues for the Historical Newspaper Digital Library. This project proposes to digitize and then do OCR and linguisting processing on several...