The Web is based on a browsing paradigm that makes it di cult to retrieve and integrate data from multiple sites. Today, the only way to do this is to build specialized applicatio...
We formulate a principle for classification with the knowledge of the marginal distribution over the data points (unlabeled data). The principle is cast in terms of Tikhonov styl...
Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...
ifferent abstraction levels, resulting in isolated `information silos'. An increasing number of task-specific software tools aim to support developers, but this often results ...
In this paper, we define and present a comprehensive classification of user intent for Web searching. The classification consists of three hierarchical levels of informational, na...
Bernard J. Jansen, Danielle L. Booth, Amanda Spink