Sciweavers

2554 search results - page 313 / 511
» Keyword query cleaning
Sort
View
WWW
2008
ACM
16 years 7 months ago
Genealogical trees on the web: a search engine user perspective
This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using al...
Ricardo A. Baeza-Yates, Álvaro R. Pereira J...
WWW
2007
ACM
16 years 7 months ago
Search engines and their public interfaces: which apis are the most synchronized?
Researchers of commercial search engines often collect data using the application programming interface (API) or by "scraping" results from the web user interface (WUI),...
Frank McCown, Michael L. Nelson
WWW
2007
ACM
16 years 7 months ago
Sliding window technique for the web log analysis
The results of the Web query log analysis may be significantly shifted depending on the fraction of agents (non-human clients), which are not excluded from the log. To detect and ...
Nikolai Buzikashvili
WWW
2007
ACM
16 years 7 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
WWW
2004
ACM
16 years 7 months ago
A novel heterogeneous data integration approach for p2p semantic link network
This paper proposes a novel approach to integrate heterogeneous data in P2P networks. The approach includes a tool for building P2P semantic link networks, mechanisms for peer sch...
Hai Zhuge, Jie Liu