Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned websearch ranking — a domain notorious for very large data sets. ...
Stephen Tyree, Kilian Q. Weinberger, Kunal Agrawal...
In this paper, conceptual frequency rate, a new frequency definition suitable for query stream mining, is introduced. An online single-pass algorithm called OFSD (Online Frequent...
Link farm spam and replicated pages can greatly deteriorate link-based ranking algorithms like HITS. In order to identify and neutralize link farm spam and replicated pages, we lo...
: We describe our participation in the TREC 2007 Enterprise track and detail our language modeling-based approaches. For document search, our focus was on estimating a mixture mode...
We consider the problem of building a P2P-based search engine for massive document collections. We describe a prototype system called ODISSEA (Open DIStributed Search Engine Archi...