In a corpus of jokes, a human might judge two documents to be the "same joke" even if characters, locations, and other details are varied. A given joke could be retold w...
Forming test collection relevance judgments from the pooled output of multiple retrieval systems has become the standard process for creating resources such as the TREC, CLEF, and...
We consider the problem of large-scale retrieval evaluation, and we propose a statistical method for evaluating retrieval systems using incomplete judgments. Unlike existing techn...
We describe a new approach to information retrieval: algorithmic mediation for intentional, synchronous collaborative exploratory search. Using our system, two or more users with ...
Jeremy Pickens, Gene Golovchinsky, Chirag Shah, Pe...
A frozen 18.5 million page snapshot of part of the Web has been created to enable and encourage meaningful and reproducible evaluation of Web search systems and techniques. This c...
David Hawking, Nick Craswell, Paul B. Thistlewaite...