WebPMI is a popular web-based association measure to evaluate the semantic similarity between two queries (i.e. words or entities) by leveraging search results returned by search ...
A lot of recent research has focused on the content-based dissemination of XML data. However, due to the heterogeneous data schemas used by different data publishers even for data...
Proximity of query terms in a document is an important criterion in IR. However, no investigation has been made to determine the most useful term sequences for which proximity sho...
Jing Bai, Yi Chang, Hang Cui, Zhaohui Zheng, Gordo...
It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...
To solve this problem, we devised the HS-bitmap index, which is hierarchically comprised of compressed data of summary bits. A summary bit in an upper matrix is obtained by logical...