Sciweavers

2190 search results - page 316 / 438
» Unweaving a web of documents
Sort
View
SIGIR
2008
ACM
15 years 6 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
WWW
2004
ACM
16 years 7 months ago
Building a companion website in the semantic web
A problem facing many textbook authors (including one of the authors of this paper) is the inevitable delay between new advances in the subject area and their incorporation in a n...
Timothy Miles-Board, Christopher Bailey, Wendy Hal...
CIKM
2009
Springer
16 years 1 months ago
On the feasibility of multi-site web search engines
Web search engines are often implemented as centralized systems. Designing and implementing a Web search engine in a distributed environment is a challenging engineering task that...
Ricardo A. Baeza-Yates, Aristides Gionis, Flavio J...
SEMWEB
2009
Springer
16 years 1 months ago
Populating the Semantic Web by Macro-reading Internet Text
A key question regarding the future of the semantic web is “how will we acquire structured information to populate the semantic web on a vast scale?” One approach is to enter t...
Tom M. Mitchell, Justin Betteridge, Andrew Carlson...
WWW
2009
ACM
15 years 11 months ago
Extracting data records from the web using tag path clustering
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
Gengxin Miao, Jun'ichi Tatemura, Wang-Pin Hsiung, ...