Multi-party voice-over-IP (MVoIP) services provide economical and natural group communication mechanisms for many emerging applications such as on-line gaming, distance collaborat...
We investigate methods of segmenting, visualizing, and indexing presentation videos by both audio and visual data. The audio track is segmented by speaker, and augmented with key ...
Human eyes have limited perception capabilities; for example, only 2 degrees of our 140 degree vision field provide the highest quality of perception. Due to this fact the idea of...
Learning the user’s semantics for CBIR involves two different sources of information: the similarity relations entailed by the content-based features, and the relevance relatio...
In this paper we unify two supposedly distinct tasks in multimedia retrieval. One task involves answering queries with a few examples. The other involves learning models for seman...