The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a...
Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew M...
Social media such as Web forum often have dense interactions between user and content where network models are often appropriate for analysis. Joint non-negative matrix factorizat...
A common practice in work groups is to share links to interesting web pages. Moreover, passages in these web pages are often cut-and-pasted, and used in various other contexts. In...
Dereferencing a URI returns a representation of the current state of the resource identified by that URI. But, on the Web representations of prior states of a resource are also av...
Herbert Van de Sompel, Robert Sanderson, Michael L...
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...