Words in Chinese text are not naturally separated by delimiters, which poses a challenge to standard machine translation (MT) systems. In MT, the widely used approach is to apply ...
Jia Xu, Jianfeng Gao, Kristina Toutanova, Hermann ...
Parallel corpora are critical resources for machine translation research and development since parallel corpora contain translation equivalences of various granularities. Manual a...
We analyze subword-based language models (LMs) in large-vocabulary continuous speech recognition across four “morphologically rich” languages: Finnish, Estonian, Turkish, and ...
Finding semantically related words is a first step in the direction of automatic ontology building. Guided by the view that similar words occur in similar contexts, we looked at t...
Using the fact that the Parikh matrix mapping is not an injective mapping, the paper investigates some properties of the set of words with the same Parikh matrix; these words are ...