Finding faces in visually challenging environments is crucial to many applications, such as audio-visual automatic speech recognition, video indexing, person recognition, and vide...
Automatic temporal segmentation of music signals into note onsets is central for a large number of audio applications. In this paper, we present a variation of a previously existi...
VoIP applications require the ability to identify speakers in real time. This paper presents Compressed Speaker Recognition (CSR), an innovative approach to perform speaker recogn...
Charu C. Aggarwal, David P. Olshefski, Debanjan Sa...
Head pose estimation is a research area which has many applications, e.g. in human computer interfaces design or in the analysis of people’s focus-of-attention. The paper addres...
We propose a quad-tree scheme for obtaining sub-pixel estimates of interframe motion in the frequency domain. Our scheme is based on phase correlation and uses motion compensated ...