For more efficient organizing, browsing, and retrieving digital video content, it is important to extract video structure information at both scene and shot levels. This paper pre...
This paper describes the results of the ICDAR 2005 competition for locating text in camera captured scenes. For this we used the same data as the ICDAR 2003 competition, which has...
Different from familiar clustering objects, text documents have sparse data spaces. A common way of representing a document is as a bag of its component words, but the semantic re...
In this paper, we study the use of spectral patterns to represent the characteristics of the rhythm of an audio signal. A function representing the position of onsets over time is...
In the automatic classification of music many different segmentations of the audio signal have been used to calculate features. These include individual short frames (23 ms), lon...