A variety of lossless compression schemes have been proposed to reduce the storage requirements of web graphs. One successful approach is virtual node compression [7], in which of...
Component Based Development (CBD) aims at constructing software through the integration, using interfaces and contracts, between pre-existing components. The main goal of this wor...
Robson P. de Souza, Marcelo N. Costa, Regina M. M....
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource d...
Soumen Chakrabarti, Martin van den Berg, Byron Dom
In TREC-10, Microsoft Research Asia (MSRA) participated in the Web track (ad hoc retrieval task and homepage finding task). The latest version of the Okapi system (Windows 2000 ve...
Jianfeng Gao, Guihong Cao, Hongzhao He, Min Zhang,...
Previous studies have highlighted the high arrival rate of new content on the web. We study the extent to which this new content can be efficiently discovered by a crawler. Our st...
Anirban Dasgupta, Arpita Ghosh, Ravi Kumar, Christ...