Methods for ranking World Wide Web resources according to their position in the link structure of the Web are receiving considerable attention, because they provide the first e...
We consider the problem of sampling URLs uniformly at random from the Web. A tool for sampling URLs uniformly can be used to estimate various properties of Web pages, such as the ...
Monika Rauch Henzinger, Allan Heydon, Michael Mitz...
Cloning (ad hoc reuse by duplication of design or code) speeds up development, but also hinders future maintenance. Cloning also hints at reuse opportunities that, if exploited sys...
Search engines present fix-length passages from documents ranked by relevance against the query. In this paper, we present and compare novel, language-model based methods for extr...
As originally conceived, the World Wide Web was intended for the purpose of sharing information. Many websites realise this aim by publishing pages from a data repository which su...