In this paper, we present the results of our work that seek to negotiate the gap between low-level features and high-level concepts in the domain of web document retrieval. This wo...
It is crucial in many information systems to organize short text segments, such as keywords in documents and queries from users, into a well-formed topic hierarchy. In this paper,...
We report an automatic feature discovery method that achieves results comparable to a manually chosen, larger feature set on a document image content extraction problem: the locat...
Our current concern is a scalable infrastructure for information retrieval (IR) with up-to-date retrieval results in the presence of frequent, continuous updates. Timely processin...
There is considerable interest in interdisciplinary combinations of automatic speech recognition (ASR), machine learning, natural language processing, text classification and info...
Mark Dredze, Aren Jansen, Glen Coppersmith, Ken Wa...