Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
A problem facing many textbook authors (including one of the authors of this paper) is the inevitable delay between new advances in the subject area and their incorporation in a n...
Timothy Miles-Board, Christopher Bailey, Wendy Hal...
Web search engines are often implemented as centralized systems. Designing and implementing a Web search engine in a distributed environment is a challenging engineering task that...
Ricardo A. Baeza-Yates, Aristides Gionis, Flavio J...
A key question regarding the future of the semantic web is “how will we acquire structured information to populate the semantic web on a vast scale?” One approach is to enter t...
Tom M. Mitchell, Justin Betteridge, Andrew Carlson...
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...