One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identifies and explores the problem of seed selection in webscal...
Structured retrieval aims at exploiting the structural information of documents when searching for documents. Structured retrieval makes use of both content and structure of docum...
Saravadee Sae Tan, Tang Enya Kong, Gian Chand Sodh...
Open content web sites depend on users to produce information of value. Wikipedia is the largest and most well-known such site. Previous work has shown that a small fraction of ed...
Katherine A. Panciera, Aaron Halfaker, Loren G. Te...
Web users are spending more of their time and creative energies within online social networking systems. While many of these networks allow users to export their personal data or ...
To search the web quickly, search engines partition the web index over many machines, and consult every partition when answering a query. To increase throughput, replicas are adde...
Costin Raiciu, Felipe Huici, Mark Handley, David S...