No search engine is perfect. A typical type of imperfection is the preference misalignment between search engines and end users, e.g., from time to time, web users skip higherrank...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields...
Structural information about a document is essential for structured query processing, indexing, and retrieval. A document page can be partitioned into a hierarchy of homogeneous r...
Given a document repository, search engine is very helpful to retrieve information. Currently, vertical search is a hot topic, and Google Scholar [4] is an example for academic se...
Ye Wang, Zhihua Geng, Sheng Huang, Xiaoling Wang, ...
In this short note we demonstrate the applicability of hyperlink downweighting by means of language model disagreement. The method filters out hyperlinks with no relevance to the ...