Most of the currently existing image retrieval systems make use of either low-level features or semantic (textual) annotations. A combined usage during annotation and retrieval is ...
We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter f...
Kai Nickel, Tobias Gehrig, Hazim Kemal Ekenel, Joh...
Given an image sequence of a scene consisting of multiple rigidly moving objects, multi-body structure-and-motion (MSaM) is the task to segment the image feature tracks into the d...
This paper focuses on the integration of multimodal features for sport video structure analysis. The method relies on a statistical model which takes into account both the shot co...
In this paper, we introduce a new approach for modeling
visual context. For this purpose, we consider the leaves of a
hierarchical segmentation tree as elementary units. Each
le...
Joseph J. Lim, Pablo Arbelaez, Chunhui Gu, and Jit...