The development of technologies to address machine translation and distillation of multilingual broadcast data depends heavily on the collection of large volumes of material from ...
The paper presents Bulgarian National Corpus project (BulNC) - a large-scale, representative, online available corpus of Bulgarian. The BulNC is also a monolingual general corpus,...
The manual transcription of human gesture behavior from video for linguistic analysis is a work-intensive process that results in a rather coarse description of the original motio...
In this paper we describe a corpus set together from two sub-corpora. The CINEMO corpus contains acted emotional expression obtained by playing dubbing exercises. This new protoco...
We present the first effort towards producing an Arabic Discourse Treebank, a news corpus where all discourse connectives are identified and annotated with the discourse relations...