We show how frequently occurring sequential patterns may be found from large datasets by first inducing a finite state automaton model describing the data, and then querying the m...
We give a permutation approach to validation (estimation of out-sample error). One typical use of validation is model selection. We establish the legitimacy of the proposed permut...
Record linkage is the problem of identifying similar records across different data sources. The similarity between two records is defined based on domain-specific similarity functi...
In practice, learning from data is often hampered by the limited training examples. In this paper, as the size of training data varies, we empirically investigate several probabil...
The biclustering, co-clustering, or subspace clustering problem involves simultaneously grouping the rows and columns of a data matrix to uncover biclusters or sub-matrices of the...