Sampling is a widely used technique to increase efficiency in database and data mining applications operating on large dataset. In this paper we present a scalable sampling imple...
Biosequences typically have a small alphabet, a long length, and patterns containing gaps (i.e., “don’t care”) of arbitrary size. Mining frequent patterns in such sequences ...
Efficient mining of frequent patterns from large databases has been an active area of research since it is the most expensive step in association rules mining. In this paper, we pr...
We consider the problem of characterisation of sequences of heterogeneous symbolic data that arise from a common underlying temporal pattern. The data, which are subject to impreci...
The main challenge of mining sequential patterns is the high processing cost of support counting for large amount of candidate patterns. For solving this problem, SPAM algorithm wa...