Broder et al.’s [3] shingling algorithm and Charikar’s [4] random projection based approach are considered “state-of-theart” algorithms for finding near-duplicate web pag...
To overcome the shortcomings posed by audio rendering of web pages for blind users, this paper implements an interaction technique where web pages are parsed so as to automaticall...
This article presents the most distinguishing features of the Argentinian web as found in a private sample of almost 10 million web pages from 150.000 sites collected in the early...
Gabriel Tolosa, Fernando Bordignon, Ricardo A. Bae...
For developers debugging their own code, augmenting the code of others, or trying to learn the implementation details of interactive behaviors, understanding how web pages work is...
Abstract. Extracting data from web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interests. In this paper, we propose a...