Subject-specific search facilities on health sites are usually built using manual inclusion and exclusion rules. These can be expensive to maintain and often provide incomplete c...
Thanh Tin Tang, David Hawking, Nick Craswell, Kath...
Wikipedia is one of the most popular information sources on the Web. The free encyclopedia is densely linked. The link structure in Wikipedia differs from the Web at large: interna...
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
A novel method for simultaneous keyphrase extraction and generic text summarization is proposed by modeling text documents as weighted undirected and weighted bipartite graphs. Sp...
It has become a promising direction to measure similarity of Web search queries by mining the increasing amount of clickthrough data logged by Web search engines, which record the...
Qiankun Zhao, Steven C. H. Hoi, Tie-Yan Liu, Soura...