This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
In this paper, we introduce an information theoretic method for estimating the usefulness of the hyperlink structure induced from the set of retrieved documents. We evaluate the e...
In this paper we present a fast and efficient match algorithm, which consists of two key techniques: Spectral Correlation Based Feature Merge(SCBFM) and Two-Step Retrieval(TSR). ...
In this paper we introduce an information theoretic approach and use techniques from the theory of Huffman codes to construct a sequence of binary sampling vectors to determine a s...
In this paper we examine user queries with respect to diversity: providing a mix of results across different interpretations. Using two query log analysis techniques (click entrop...
Paul Clough, Mark Sanderson, Murad Abouammoh, Serg...