—In increasingly many cases of interest in computer vision and pattern recognition, one is often confronted with the situation where data size is very large. Usually, the labels ...
The vocabulary used in speech usually consists of two types of words: a limited set of common words, shared across multiple documents, and a virtually unlimited set of rare words, ...
Stefan Kombrink, Mirko Hannemann, Lukas Burget, Hy...
Pictorial structure (PS) models are extensively used for part-based recognition of scenes, people, animals and multi-part objects. To achieve tractability, the structure and param...
In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is recorded by one microphone only. The goal is to identify both...
Rahim Saeidi, Pejman Mowlaee, Tomi Kinnunen, Zheng...
Visual attributes expose human-defined semantics to object recognition models, but existing work largely restricts their influence to mid-level cues during classifier training....