We address the task of learning rankings of documents from search engine logs of user behavior. Previous work on this problem has relied on passively collected clickthrough data. ...
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
Intelligent Web search engines are extremely popular now. Currently, only commercial centralized search engines like Google can process terabytes of Web data. Alternative search en...
Sergey Chernov, Pavel Serdyukov, Matthias Bender, ...
Almost conventional search engines employ centralized architecture. However, such an engine is not suitable for fresh information retrieval because it spends a long time to collec...
We study in this paper the Web forum crawling problem, which is a very fundamental step in many Web applications, such as search engine and Web data mining. As a typical user-crea...
Rui Cai, Jiang-Ming Yang, Wei Lai, Yida Wang, Lei ...