Most standard information retrieval models use a single source of information (e.g., the retrieval corpus) for query formulation tasks such as term and phrase weighting and query ...
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...
This paper describes the WebCLEF 2007 task. The task definition—which goes beyond traditional navigational queries and is concerned with undirected information search goals—c...
Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if...
A voice search system requires a speech interface that can correctly recognize spoken queries uttered by users. The recognition performance strongly relies on a robust language mo...
Xiao Li, Patrick Nguyen, Geoffrey Zweig, Dan Bohus