Sciweavers

563 search results - page 69 / 113
» Crawling the web for structured documents
Sort
View
SAINT
2005
IEEE
15 years 11 months ago
Learning Logic Wrappers for Information Extraction from the Web
This paper discusses a methodology for applying general-purpose first-order inductive learning to extract information from Web documents structured as unranked ordered trees. The...
Costin Badica, Elvira Popescu, Amelia Badica
HT
1996
ACM
15 years 10 months ago
HyPursuit: A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering
HyPursuit is a new hierarchical network search engine that clusters hypertext documents to structure a given information space for browsing and search activities. Our content-link...
Ron Weiss, Bienvenido Vélez, Mark A. Sheldo...
ADC
2006
Springer
130views Database» more  ADC 2006»
16 years 4 days ago
A two-phase rule generation and optimization approach for wrapper generation
Web information extraction is a fundamental issue for web information management and integrations. A common approach is to use wrappers to extract data from web pages or documents...
Yanan Hao, Yanchun Zhang
WEBI
2005
Springer
15 years 11 months ago
Automated Metadata and Instance Extraction from News Web Sites
In this paper, we present automated techniques for extracting metadata instance information by organizing and mining a set of news Web sites. We develop algorithms that detect and...
Srinivas Vadrevu, Saravanakumar Nagarajan, Fatih G...
CIKM
2008
Springer
15 years 8 months ago
Dr. Searcher and Mr. Browser: a unified hyperlink-click graph
We introduce a unified graph representation of the Web, which includes both structural and usage information. We model this graph using a simple union of the Web's hyperlink ...
Barbara Poblete, Carlos Castillo, Aristides Gionis