Even in a massive corpus such as the Web, a substantial fraction of extractions appear infrequently. This paper shows how to assess the correctness of sparse extractions by utiliz...
Frequency counts from very large corpora, such as the Web 1T dataset, have recently become available for language modeling. Omission of low frequency n-gram counts is a practical ...
With the broad availability and increasing performance of commodity graphics processors (GPU), non-graphical applications have become an active field of research. However, leveragi...
Electronic written texts used in computermediated interactions (e-mails, blogs, chats, etc) present major deviations from the norm of the language. This paper presents an comparat...
tra Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text. In this paper, we propose a method to perform domain adapta...