It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...
This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using al...
This paper presents a new query recommendation method that generates recommended query list by mining large-scale user logs. Starting from the user logs of click-through data, we ...
We propose a method of classifying XML documents and extracting XML schema from XML by inductive inference based on constraint logic programming. The goal of this work is to type ...
In the sponsored search model, search engines are paid by businesses that are interested in displaying ads for their site alongside the search results. Businesses bid for keywords...