The Enron Email Corpus provides "Real World" text in the business email domain, which is a target domain for many speech and language applications. We present a section ...
This paper presents the multimodal corpora that are being collected and annotated in the Nordic NOMCO project. The corpora will be used to study communicative phenomena such as fe...
Patrizia Paggio, Jens Allwood, Elisabeth Ahlsen, K...
Predicting possible code-switching points can help develop more accurate methods for automatically processing mixed-language text, such as multilingual language models for speech ...
Part-of-Speech tagging is generally performed by Markov models, based on bigram or trigram models. While Markov models have a strong concentration on the left context of a word, m...
In this paper, we describe a new methodology to develop mixed-initiative spoken dialog systems, which is based on the extensive use of simulations to accelerate the development pr...