We report about tools for the extraction of German multiword expressions (MWEs) from text corpora; we extract word pairs, but also longer MWEs of different patterns, e.g. verb-nou...
In this paper we describe an image based document retrieval system which runs on camera enabled mobile devices. "Mobile Retriever" aims to seamlessly link physical and di...
A novel method for the segmentation of double-sided ancient document images suffering from bleed-through effect is presented. It takes advantage of the level set framework to prov...
Models of bags of words typically assume topic mixing so that the words in a single bag come from a limited number of topics. We show here that many sets of bag of words exhibit a...
We present a new machine learning framework called "self-taught learning" for using unlabeled data in supervised classification tasks. We do not assume that the unlabele...
Rajat Raina, Alexis Battle, Honglak Lee, Benjamin ...