We investigates language models for informational and navigational web search. Retrieval on the web is a task that differs substantially from ordinary ad hoc retrieval. We perfor...
We demonstrate the usefulness of the uniform resource locator (URL) alone in performing web page classification. This approach is magnitudes faster than typical web page classific...
In this paper, we propose a versatile disambiguation approach which can be used to make explicit the meaning of structure based information such as XML schemas, XML document struc...
Text similarity spans a spectrum, with broad topical similarity near one extreme and document identity at the other. Intermediate levels of similarity – resulting from summariza...
Donald Metzler, Yaniv Bernstein, W. Bruce Croft, A...
In this paper we address the problem of unsupervised Web data extraction. We show that unsupervised Web data extraction becomes feasible when supposing pages that are made up of r...