In this paper we present our technique for finding semantically similar clusters within web documents obtained from a set of queries retrieved from the Google search engine. This ...
We investigate the use of probabilistic models and cost-benefit analyses to guide the operation of a Web-based question-answering system. We first provide an overview of research ...
David Azari, Eric Horvitz, Susan T. Dumais, Eric B...
Broder et al.’s [3] shingling algorithm and Charikar’s [4] random projection based approach are considered “state-of-theart” algorithms for finding near-duplicate web pag...
The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynami...
Several initiatives for establishing standards for metadata models are being carried out at the moment, but everyone focuses on their own requirements when defining metadata attri...