It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...
In recent years, the vast amount of digitally available content has lead to the creation of many topic-centered digital libraries. Also in the domain of chemistry more and more di...
In this paper we describe a problem of discovering query clusters from a click-through graph of web search logs. The graph consists of a set of web search queries, a set of pages ...
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawl...
Familiar evaluation methodologies for information retrieval (IR) are not well suited to the task of comparing systems in many real settings. These systems and evaluation methods m...