Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if...
One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identifies and explores the problem of seed selection in webscal...
A new method for augmenting paper documents with electronic information is described that does not modify the format of the paper document in any way. Applicable to both commercia...
Jonathan J. Hull, Berna Erol, Jamey Graham, Qifa K...
Cyberinfrastructure integrates information and communication technologies to enable high-performance, distributed, and collaborative knowledge discovery, and promises to revolutio...
Search engines present fix-length passages from documents ranked by relevance against the query. In this paper, we present and compare novel, language-model based methods for extr...