In this paper we show how to train statistical machine translation systems on reallife tasks using only non-parallel monolingual data from two languages. We present a modificatio...
We present a new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages; it reflects...
In this paper we analyze the statistical distribution of the keystream generator used by the stream ciphers RC4 and RC4A. Our first result is the discovery of statistical biases o...
In this paper, we present our Hindi to English and Marathi to English CLIR systems developed as part of our participation in the CLEF 2007 Ad-Hoc Bilingual task. We take a query tr...
Manoj Kumar Chinnakotla, Sagar Ranadive, Om P. Dam...
Many systems for tasks such as question answering, multi-document summarization, and information retrieval need robust numerical measures of lexical relatedness. Standard thesauru...