This paper presents Carnegie Mellon University’s experiments on the mixed named-page and homepage finding task of the TREC 12 Web Track. Our results were strong; we achieved the...
This paper describes a program that disambignates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. Roget's ca...
Active learning (AL) is a framework that attempts to reduce the cost of annotating training material for statistical learning methods. While a lot of papers have been presented on...
Typographic and visual information is an integral part of textual documents. Most information extraction systems ignore most of this visual information, processing the text as a l...
Errors are unavoidable in advanced computer vision applications such as optical character recognition, and the noise induced by these errors presents a serious challenge to downstr...