Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter responses, or SIFT descriptors) into richer representations of interme...
Y-Lan Boureau, Francis Bach, Yann LeCun, Jean Ponc...
This paper presents an emotion recognition system from clean and noisy speech. Geodesic distance was adopted to preserve the intrinsic geometry of emotional speech. Based on the g...
Mingyu You, Chun Chen, Jiajun Bu, Jia Liu, Jianhua...
In this paper, we propose a structure and components of a conversational television set(TV) to which we can ask anything on the broadcasted contents and receive the interesting in...
Abstract. An important step to bring speech technologies into wide deployment as a functional component in man-machine interfaces is to free the users from close-talk or desktop mi...
Stephen M. Chu, Etienne Marcheret, Gerasimos Potam...
Audiovisual speech recognition (AVSR) systems have been proven superior over audio-only speech recognizers in noisy environments by incorporating features of the visual modality. ...
Alexander Vorwerk, Xiaohui Wang, Dorothea Kolossa,...