This paper presents AnCora, a multilingual corpus annotated at different linguistic levels consisting of 500,000 words in Catalan (AnCora-Ca) and in Spanish (AnCora-Es). At presen...
In this paper, we present an original framework to model frame semantic resources (namely, FrameNet) using minimal supervision. This framework can be leveraged both to expand an e...
Marco Pennacchiotti, Diego De Cao, Paolo Marocco, ...
The number and sizes of parallel corpora keep growing, which makes it necessary to have automatic methods of processing them: combining, checking and improving corpora quality, et...
Speech synthesis by unit selection requires the segmentation of a large single speaker high quality recording. Automatic speech recognition techniques, e.g. Hidden Markov Models (...
Pierre Lanchantin, Andrew C. Morris, Xavier Rodet,...
We report on an exploratory, qualitative user study designed to identify users' goals underlying podcast search, the strategies used to gain access to podcasts, and how curre...