Usually the mel-frequency cepstral coefficients (MFCCs) are derived via Hamming windowed DFT spectrum. In this paper, we advocate to use a so-called multitaper method instead. Mul...
Tomi Kinnunen, Rahim Saeidi, Johan Sandberg, Maria...
In this paper, a robust feature for text-independent speaker recognition is proposed, which simulate the response mode of cochlear neurons in processing acoustic signal. The featu...
Abstract. The present work is concerned with speech recognition using a small or medium size vocabulary. The possibility to use the English speech recognizer for the recognition of...
Audiovisual speech recognition (AVSR) systems have been proven superior over audio-only speech recognizers in noisy environments by incorporating features of the visual modality. ...
Alexander Vorwerk, Xiaohui Wang, Dorothea Kolossa,...
There has been increasing interest recently in meeting understanding, such as summarization, browsing, action item detection, and topic segmentation. However, there is very limite...