Today's Internet appliances feature user interface technologies almost unknown a few years ago: touch screens, styli, handwriting and voice recognition, speech synthesis, tin...
Marc Abrams, Constantinos Phanouriou, Alan L. Bato...
This article describes an architecture for translating speech into Spanish Sign Language (SSL). The architecture proposed is made up of four modules: speech recognizer, semantic a...
Narrative peaks are points at which the viewer perceives a spike in the level of dramatic tension within the narrative flow of a video. This paper reports on four approaches to na...
This paper presents a bottom-up approach that combines audio and video to simultaneously locate individual speakers in the video (2-D source localization) and segment their speech ...
This paper regards images with captions as a cross-media parallel corpus, and presents a corpus-based relevance feedback approach to combine the results of visual and textual runs....