This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify...
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
Many searches on the web have a transactional intent. We argue that pages satisfying transactional needs can be distinguished from the more common pages that have some information...
Automatically generated content is ubiquitous in the web: dynamic sites built using the three-tier paradigm are good examples (e.g. commercial sites, blogs and other sites powered...
Abstract—Internet topology generation involves producing synthetic network topologies that imitate the characteristics of the Internet. Although the accuracy of newly developed n...