The goal of this paper is to argue the need to approach the personalization issues in Web applications from the very beginning in the application's development cycle. Since p...
Daniel Schwabe, Gustavo Rossi, Robson Guimar&atild...
Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of Web pages reachable purely by following hypertext links, ignoring search forms and pag...
Web applications are growing in demand, complexity and size, thus making it difficult to systematically design and maintain general web applications. To aid in fulfilling these di...
Previous work showed that statistical analysis techniques could successfully be used to construct compact signatures of distinct operational problems in Internet server systems. B...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...