A major challenge in developing models for hypertext retrieval is to effectively combine content information with the link structure available in hypertext collections. Although s...
The increasing importance of search engines to commercial web sites has given rise to a phenomenon we call “web spam”, that is, web pages that exist only to mislead search eng...
This paper describes two visualisation algorithms that give an impression of current activity on a web site. Both focus on giving a sense of the trail of individual visitors withi...
Research into the Internet has experienced a tremendous growth within the field of information systems. In this sense, the recent literature focuses on more complex research topic...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...