Efficient processing of tera-scale text data is an important research topic. This paper proposes lossless compression of Ngram language models based on LOUDS, a succinct data stru...
A collection of 3208 reported errors of Chinese words were analyzed. Among which, 7.2% involved rarely used character, and 98.4% were assigned common classifications of their caus...
Binarization of n-ary rules is critical for the efficiency of syntactic machine translation decoding. Because the target side of a rule will generally reorder the source side, it ...
Modeling of individual users is a promising way of improving the performance of spoken dialogue systems deployed for the general public and utilized repeatedly. We define "im...
Discourse connectives are words or phrases such as once, since, and on the contrary that explicitly signal the presence of a discourse relation. There are two types of ambiguity t...