This article presents a method of extracting bilingual lexica composed of single-word terms (SWTs) and multi-word terms (MWTs) from comparable corpora of a technical domain. First,...
Abstract. In this paper we present the creation of a Mexican Spanish version of the CMU Sphinx-III speech recognition system. We trained acoustic and N-gram language models with a ...
Cellular automata can be used to design high-performance natural solvers on parallel computers. This paper describes the development of applications using CARPET, a high-level prog...
Originally conceived as a "naive" baseline experiment using traditional n-gram language models as classifiers, the NCLEANER system has turned out to be a fast and lightw...
Reordering is currently one of the most important problems in statistical machine translation systems. This paper presents a novel strategy for dealing with it: statistical machin...