Previous studies have highlighted the high arrival rate of new content on the web. We study the extent to which this new content can be efficiently discovered by a crawler. Our st...
Anirban Dasgupta, Arpita Ghosh, Ravi Kumar, Christ...
The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems...
Clio is an existing schema-mapping tool that provides user-friendly means to manage and facilitate the complex task of transformation and integration of heterogeneous data such as...
Haifeng Jiang, Howard Ho, Lucian Popa, Wook-Shin H...
Web-based communities have become important places for people to seek and share expertise. We find that networks in these communities typically differ in their topology from other...
A collaborative crawler is a group of crawling nodes, in which each crawling node is responsible for a specific portion of the web. We study the problem of collecting geographical...