In this paper, we adopt a direct modeling approach to utilize conversational gesture cues in detecting sentence boundaries, called SUs, in video taped conversations. We treat the ...
We propose Recursive Compositional Models (RCMs) for simultaneous multi-view multi-object detection and parsing (e.g. view estimation and determining the positions of the object s...
Leo Zhu, Yuanhao Chen, Antonio Torralba, William F...
We present a method that automatically detects chewing events in surveillance video of a subject. Firstly, an Active Appearance Model (AAM) is used to track a subject’s face acr...
We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which use boos...
Antonio Torralba, Kevin P. Murphy, William T. Free...
Selective attention in the human visual system is performed as the way that humans focus on the most important parts when observing a visual scene. Many bottom-up computational mo...