Internet is a huge source of information. Search engines have indexed much of this information and are able to extract the relevant webpages that are related to a given query. Howe...
In this work we compare different techniques to automatically find candidate web pages to substitute broken links. We extract information from the anchor text, the content of the p...
We consider the problem of sampling URLs uniformly at random from the Web. A tool for sampling URLs uniformly can be used to estimate various properties of Web pages, such as the ...
Monika Rauch Henzinger, Allan Heydon, Michael Mitz...
: A mass of heterogeneous, distributed and dynamic information on the World Wide Web (the Web) has resulted in "information overload". It's an important and urgent r...
Jicheng Wang, Xiangyu Jin, Yang Xiaojiang, Fuyan Z...
We extend the constellation model to include heterogeneous parts which may represent either the appearance or the geometry of a region of the object. The parts and their spatial co...