Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalize...
Bill Yuan-chi Chiu, Eamonn J. Keogh, Stefano Lonar...
From the standpoint of the automated extraction of scientific knowledge, an important but little-studied part of scientific publications are the figures and accompanying captions....
William W. Cohen, Richard C. Wang, Robert F. Murph...
This paper considers the use of computational stylistics for performing authorship attribution of electronic messages, addressing categorization problems with as many as 20 differ...
Shlomo Argamon, Marin Saric, Sterling Stuart Stein
Distance function computation is a key subtask in many data mining algorithms and applications. The most effective form of the distance function can only be expressed in the conte...
Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal o...
Privacy is an important issue in data mining and knowledge discovery. In this paper, we propose to use the randomized response techniques to conduct the data mining computation. S...
High dimensional directional data is becoming increasingly important in contemporary applications such as analysis of text and gene-expression data. A natural model for multivaria...
Arindam Banerjee, Inderjit S. Dhillon, Joydeep Gho...