Search engines process queries conjunctively to restrict the size of the answer set. Further, it is not rare to observe a mismatch between the vocabulary used in the text of Web p...
As opposed to representing a document as a "bag of words" in most information retrieval applications, we propose a model of representing a web page as sets of named enti...
Nan Di, Conglei Yao, Mengcheng Duan, Jonathan J. H...
The purpose of this paper is threefold. First, we study the evolution of the web based on data available from an earlier snapshot of the web and compare the results with those pre...
Wei-Tsen Milly Chiang, Markus Hagenbuchner, Ah Chu...
A staggering number of multimedia applications are being introduced every day. Yet, the inordinate delays encountered in retrieving multimedia documents make it difficult to use t...
— Commercial tuple extraction systems have enjoyed some success to extract tuples by regarding HTML pages as tree structures and exploiting XPath queries to find attributes of t...