A wealth of information is available on the Web. But often, such data are hidden behind form interfaces which allow only a restrictive set of queries over the underlying databases...
We propose mixtures of hidden Markov models for modelling clickstreams of web surfers. Hence, the page categorization is learned from the data without the need for a (possibly cumb...
Current user interfaces for Web search, including browsers and search engine sites, typically treat search as a transient activity. However, people often conduct complex, multique...
Web pages such as news and shopping sites often use modular layouts. When used effectively this practice allows authors to present clearly large amounts of information in a single...
This paper studies structured data extraction from Web pages, e.g., online product description pages. Existing approaches to data extraction include wrapper induction and automatic...