Sciweavers

1834 search results - page 221 / 367
» Web Mining in Search Engines
Sort
View
ICDM
2006
IEEE
164views Data Mining» more  ICDM 2006»
16 years 14 days ago
Unsupervised Learning of Tree Alignment Models for Information Extraction
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...
Philip Zigoris, Damian Eads, Yi Zhang
SIGMOD
2000
ACM
85views Database» more  SIGMOD 2000»
15 years 10 months ago
Finding Replicated Web Collections
Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times....
Junghoo Cho, Narayanan Shivakumar, Hector Garcia-M...
MSR
2009
ACM
15 years 11 months ago
On mining data across software repositories
Software repositories provide abundance of valuable information about open source projects. With the increase in the size of the data maintained by the repositories, automated ext...
Prasanth Anbalagan, Mladen A. Vouk
WWW
2007
ACM
16 years 7 months ago
A large-scale study of robots.txt
Search engines largely rely on Web robots to collect information from the Web. Due to the unregulated open-access nature of the Web, robot activities are extremely diverse. Such c...
Yang Sun, Ziming Zhuang, C. Lee Giles
HT
2009
ACM
16 years 28 days ago
Comparing the performance of us college football teams in the web and on the field
In previous research it has been shown that link-based web page metrics can be used to predict experts’ assessment of quality. We are interested in a related question: do expert...
Martin Klein, Olena Hunsicker, Michael L. Nelson