We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
There has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not ...
Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utiliz...
The advent of e-commerce has created a trend that brought thousands of catalogs online. Most of these websites are “taxonomy-directed”. A Web site is said to be ``taxonomydire...
An appreciation of the roles of genre and task is important in understanding how people browse the Web. Genre is characterized by content and form and is intimately linked to the ...
Carolyn R. Watters, Michael A. Shepherd, Forbes J....