In the recent years, we have witnessed a dramatic increment in the volume of spam email. Other related forms of spam are increasingly revealing as a problem of importance, special...
Missing web pages, URIs that return the 404 “Page Not Found” error or the HTTP response code 200 but dereference unexpected content, are ubiquitous in today’s browsing exper...
Martin Klein, Jeffery L. Shipman, Michael L. Nelso...
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
We have developed a technique to characterize software developers' styles using a set of source code metrics. This style fingerprint can be used to identify the likely author...
Familiar evaluation methodologies for information retrieval (IR) are not well suited to the task of comparing systems in many real settings. These systems and evaluation methods m...