How can a search engine automatically provide the best and most appropriate title for a result URL (link-title) so that users will be persuaded to click on the URL? We consider th...
Wikipedia is an example of the large, collaborative, semi-structured data sets emerging on the Web. Typically, before these data sets can be used, they must transformed into struc...
Redirection spam presents a web page with false content to a crawler for indexing, but automatically redirects the browser to a different web page. Redirection is usually immediat...
People often use powerful tools to manage the documents they encounter, but very rarely to store the mental knowledge they glean from those documents. Popcorn is a personal knowle...
Stephen Davies, Scotty Allen, Jon Raphaelson, Emil...
For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...