Fully automatic machine translation cannot produce high quality translation; Dialog-Based Machine Translation (DBMT) is the only way to provide authors with a means of translating...
Data cleaning is the process of correcting anomalies in a data source, that may for instance be due to typographical errors, or duplicate representations of an entity. It is a cruc...
Clio is a joint research project between the University of Toronto and IBM Almaden Research Center started in 1999 to address both foundational and systems issues related to the ma...
Background: The omics fields promise to revolutionize our understanding of biology and biomedicine. However, their potential is compromised by the challenge to analyze the huge da...
Phonetic string transduction problems, such as letter-to-phoneme conversion and name transliteration, have recently received much attention in the NLP community. In the past few y...
Sittichai Jiampojamarn, Colin Cherry, Grzegorz Kon...