A principal weakness of conventional (i.e., non-hierarchical) phrase-based statistical machine translation is that it can only exploit continuous phrases. In this paper, we extend...
We explore the relation between word sense subjectivity and cross-lingual lexical substitution, following the intuition that good substitutions will transfer a word's (contex...
The use of well-nested linear context-free rewriting systems has been empirically motivated for modeling of the syntax of languages with discontinuous constituents or relatively f...
The task of identifying the language of text or utterances has a number of applications in natural language processing. Language identification has traditionally been approached w...
Language identification is the task of identifying the language a given document is written in. This paper describes a detailed examination of what models perform best under diffe...