We investigate methods of segmenting, visualizing, and indexing presentation videos by both audio and visual data. The audio track is segmented by speaker, and augmented with key ...
Human eyes have limited perception capabilities; for example, only 2 degrees of our 140 degree vision field provide the highest quality of perception. Due to this fact the idea of...
Large-scale distributed video surveillance systems pose new scalability challenges. Due to the large number of video sources in such systems, the amount of bandwidth required to t...
Learning the user’s semantics for CBIR involves two different sources of information: the similarity relations entailed by the content-based features, and the relevance relatio...
In this paper we unify two supposedly distinct tasks in multimedia retrieval. One task involves answering queries with a few examples. The other involves learning models for seman...