We present a framework to synchronize pop music to corresponding text lyric. We refine line level alignment achievable by existing work to syllabic level by using a dynamic progra...
In this paper, we propose an approach for detecting signs from natural scenes. The approach efficiently embeds multiresolution, adaptive search, and affine rectification algorithm...
This paper addresses the problem of developing appropriate features for use in direct modeling approaches to speech recognition, such as those based on Maximum Entropy models or S...
This paper studies the influence of n-gram language models in the recognition of sung phonemes and words. We train uni-, bi-, and trigram language models for phonemes and bi- and...
The human voice is primarily a carrier of speech, but it also contains non-linguistic features unique to a speaker and indicative of various speaker demographics, e.g. gender, nat...