We propose a novel method to detect cultural differences over the world automatically by using a large amount of geotagged images on the photo sharing Web sites such as Flickr. W...
Detection of filled pauses is a challenging research problem which has several practical applications. It can be used to evaluate the spoken fluency skills of the speaker, to im...
Kartik Audhkhasi, Kundan Kandhway, Om Deshmukh, As...
In the Weighted Finite State Transducer (WFST) framework for speech recognition, we can reduce memory usage and increase flexibility by using on-the-fly composition which genera...
Tasuku Oonishi, Paul R. Dixon, Koji Iwano, Sadaoki...
In this paper, we introduce a new histogram equalizationbased environmental model adaptation method for robust speech recognition in noise environments. The proposed method adapts...
For effective training of acoustic and language models for spontaneous speech such as meetings, it is significant to exploit the texts available in a large scale, which may not b...