Truecasing is the process of restoring case information to badly-cased or noncased text. This paper explores truecasing issues and proposes a statistical, language modeling based ...
Lucian Vlad Lita, Abraham Ittycheriah, Salim Rouko...
Discourse chunking is a simple way to segment dialogues according to how dialogue participants raise topics and negotiate them. This paper explains a method for arranging dialogue...
In this paper, we present a method for identifying discourse marker usage in spontaneous speech based on machine learning. Discourse markers are denoted by special POS tags, and t...
Query ambiguity is a generally recognized problem, particularly in Web environments where queries are commonly only one or two words in length. In this study, we explore one techn...
Factored Statistical Machine Translation extends the Phrase Based SMT model by allowing each word to be a vector of factors. Experiments have shown effectiveness of many factors, ...