Abstract. Processing biological data often requires handling of uncertain and sometimes inconsistent information. Particularly when coping with image segmentation tasks against bio...
We present a domain-independent topic segmentation algorithm for multi-party speech. Our feature-based algorithm combines knowledge about content using a text-based algorithm as a...
Michel Galley, Kathleen McKeown, Eric Fosler-Lussi...
Truecasing is the process of restoring case information to badly-cased or noncased text. This paper explores truecasing issues and proposes a statistical, language modeling based ...
Lucian Vlad Lita, Abraham Ittycheriah, Salim Rouko...
Collocational prepositional phrases like ten koste van (at the expense of), met het oog op (with an eye on), and onder het mom van (under the pretext of) are patterns of the form ...
Cross Language Information Retrieval community has brought up search engines over multilingual corpora, and multilingual text categorization systems. In this paper, we focus on th...