This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recogniti...
GOD (General Ontology Discovery) is an unsupervised system to extract semantic relations among domain specific entities and concepts from texts. Operationally, it acts as a search...
Abstract. Processing biological data often requires handling of uncertain and sometimes inconsistent information. Particularly when coping with image segmentation tasks against bio...
We present a domain-independent topic segmentation algorithm for multi-party speech. Our feature-based algorithm combines knowledge about content using a text-based algorithm as a...
Michel Galley, Kathleen McKeown, Eric Fosler-Lussi...
Truecasing is the process of restoring case information to badly-cased or noncased text. This paper explores truecasing issues and proposes a statistical, language modeling based ...
Lucian Vlad Lita, Abraham Ittycheriah, Salim Rouko...