A visual word lexicon can be constructed by clustering primitive visual features, and a visual object can be described by a set of visual words. Such a "bag-of-words" re...
Several authors have noticed that the common representation of images as vectors is sub-optimal. The process of vectorization eliminates spatial relations between some of the near...
A successful representation of objects in the literature is as a collection of patches, or parts, with a certain appearance and position. The relative locations of the different p...
In many vision problems, instead of having fully labeled training data, it is easier to obtain the input in small groups, where the data in each group is constrained to be from th...
Recent approaches to action classification in videos have used sparse spatio-temporal words encoding local appearance around interesting movements. Most of these approaches use a ...