With the ever-increasing growth of the Internet, numerous copies of documents become serious problem for search engine, opinion mining and many other web applications. Since parti...
Abstract—The grapheme codebook is a high-performing technique for offline writer identification. This paper considers whether the de facto standards for initial grapheme extrac...
The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible...
Marcis Pinnis, Radu Ion, Dan Stefanescu, Fangzhong...
Many applications involve multiple-modalities such as text and images that describe the problem of interest. In order to leverage the information present in all the modalities, on...
This paper reports on the participation of ITC-irst in the Cross Language Evaluation Forum 2003; in particular, in the monolingual, bilingual, small multilingual, and spoken docum...