Co-citation (number of nodes linking to both of a given pair of nodes) is often used heuristically to judge similarity between nodes in a complex network. We investigate the relat...
We propose a partitioning scheme for similarity search indexes that is called Maximal Metric Margin Partitioning (MMMP). MMMP divides the data on the basis of its distribution pat...
One of the Web information Retrieval (IR) problems these days is to identify redundant information that exist in (replicated) Web documents. These documents can easily be found in...
A growing number of applications are built on top of search engines and issue complex structured queries. This paper contributes a customisable ranking-based processing of such qu...
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...