Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Determining the user intent of Web searches is a difficult problem due to the sparse data available concerning the searcher. In this paper, we examine a method to determine the us...
Bernard J. Jansen, Danielle L. Booth, Amanda Spink
Long-term search history contains rich information about a user's search preferences. In this paper, we study statistical language modeling based methods to mine contextual i...
The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a pa...
After several years of development, the vision of the Semantic Web is gradually becoming reality. Large data repositories have been created and offer semantic information in a mac...