Sciweavers

1647 search results - page 145 / 330
» Radial Structure of the Internet
Sort
View
WWW
2007
ACM
16 years 7 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
WWW
2007
ACM
16 years 7 months ago
U-REST: an unsupervised record extraction system
In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...
Yuan Kui Shen, David R. Karger
WWW
2007
ACM
16 years 7 months ago
Classifying web sites
In this paper, we present a novel method for the classification of Web sites. This method exploits both structure and content of Web sites in order to discern their functionality....
Christoph Lindemann, Lars Littig
WWW
2006
ACM
16 years 7 months ago
XPath filename expansion in a Unix shell
Locating files based on file system structure, file properties, and maybe even file contents is a core task of the user interface of operating systems. By adapting XPath's po...
Kaspar Giger, Erik Wilde
WWW
2006
ACM
16 years 7 months ago
Mining clickthrough data for collaborative web search
This paper is to investigate the group behavior patterns of search activities based on Web search history data, i.e., clickthrough data, to boost search performance. We propose a ...
Jian-Tao Sun, Xuanhui Wang, Dou Shen, Hua-Jun Zeng...