The Web is a valuable source of language speci c resources but the process of collecting, organizing and utilizing these resources is di cult. We describe CorpusBuilder, an approa...
In this paper, we study the problem of learning block classification models to estimate block functions. We distinguish general models, which are learned across multiple sites, an...
Presentation of search results in Web-based information retrieval (IR) systems has been dominated by a textual form of information such as the title, snippet, URL, and/or file type...
With our participation in TREC2004, we test Terrier, a modular and scalable Information Retrieval framework, in three tracks. For the mixed query task of the Web track, we employ ...
Results caching is an efficient technique for reducing the query processing load, hence it is commonly used in real search engines. This technique, however, bounds the maximum hit...