The dominant method for evaluating search engines is the Cranfield paradigm, but the existing metrics do not consider some modern search engines features, such as document snippets...
Information needs are rarely satisfied directly on search engine result pages. Searchers usually need to click through to search results (landing pages) and follow search trails b...
Nowadays, information is primarily searched on the WWW. From a user perspective, the readability is an important criterion for measuring the accessibility and thereby the quality ...
This work aims to provide a novel, site-specific web page segmentation and section importance detection algorithm, which leverages structural, content, and visual information. The...
We describe an adaptive method for extracting records from web pages. Our algorithm combines a weighted tree matching metric with clustering for obtaining data extraction patterns...