This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
In order to minimize redundancy and optimize coverage of multiple user interests, search engines and recommender systems aim to diversify their set of results. To date, these dive...
We present a principled methodology for filtering news stories by formal measures of information novelty, and show how the techniques can be used to custom-tailor newsfeeds based ...
Evgeniy Gabrilovich, Susan T. Dumais, Eric Horvitz
An e-lesson is comprised of a "body" and a "view". The body is the actual content of the e-lesson and the assumption is that it is an html document. The view i...
The novelty track was first introduced in TREC 2002. Given a TREC topic and an ordered list of documents, systems must find the relevant and novel sentences that should be retur...