The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents a large portion of the structu...
Jayant Madhavan, David Ko, Lucja Kot, Vignesh Gana...
Searching for medical information on the Web is highly popular these days. To facilitate ordinary people to perform medical search and preliminary disease self-diagnosis, we have ...
Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that...
Dragomir R. Radev, Weiguo Fan, Hong Qi, Harris Wu,...
— The web today is increasingly characterized by social and real-time signals, which we believe represent two frontiers in information retrieval. In this paper, we present Earlyb...
Michael Busch, Krishna Gade, Brian Larson, Patrick...
Broder et al.’s [3] shingling algorithm and Charikar’s [4] random projection based approach are considered “state-of-theart” algorithms for finding near-duplicate web pag...