A frozen 18.5 million page snapshot of part of the Web has been created to enable and encourage meaningful and reproducible evaluation of Web search systems and techniques. This c...
David Hawking, Nick Craswell, Paul B. Thistlewaite...
Abstract. Information retrieval from web and XML document collections is ever more focused on returning entities instead of web pages or XML elements. There are many research field...
Jovan Pehcevski, Anne-Marie Vercoustre, James A. T...
The Web offers rich relational data with different semantics. In this paper, we address the problem of document recommendation in a digital library, where the documents in questio...
Ding Zhou, Shenghuo Zhu, Kai Yu, Xiaodan Song, Bel...
The automatic generation of back-of-the book indexes seems to be out of sight of the Information Retrieval and Natural Language Processing communities, although the increasingly la...
Some large scale topical digital libraries, such as CiteSeer, harvest online academic documents by crawling open-access archives, university and author homepages, and authors’ s...