This article presents a new freely available trilingual corpus (Catalan, Spanish, English) that contains large portions of the Wikipedia and has been automatically enriched with l...
Samuel Reese, Gemma Boleda, Montse Cuadros, Llu&ia...
An important feature of spoken language corpora is existence of different spelling variants of words in transcription. So there is an important problem for linguist who works with...
Natural language processing technology has developed remarkably, but it is still difficult for computers to understand contextual meanings as humans do. The purpose of our work ha...
Semantic similarity is a key issue in many computational tasks. This paper goes into the development and evaluation of two common ways of automatically calculating the semantic si...
Yves Peirsman, Simon De Deyne, Kris Heylen, Dirk G...
We present a novel paradigm for statistical machine translation (SMT), based on a joint modeling of word alignment and the topical aspects underlying bilingual document-pairs, via...