We consider the problem of learning a similarity function from a set of positive equivalence constraints, i.e. 'similar' point pairs. We define the similarity in informa...
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
Sequential patterns of d-gaps exist pervasively in inverted lists of Web document collection indices due to the cluster property. In this paper the information of d-gap sequential...
Hidden Markov models (HMMs) have received considerable attention in various communities (e.g, speech recognition, neurology and bioinformatic) since many applications that use HMM...
In this paper, we propose a set of novel regression-based approaches to effectively and efficiently summarize frequent itemset patterns. Specifically, we show that the problem of ...