Background: The post-genomic era is characterised by a torrent of biological information flooding the public databases. As a direct consequence, similarity searches starting with ...
Anne Friedrich, Raymond Ripp, Nicolas Garnier, Emm...
Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. These d...
Eui-Hong Han, George Karypis, Vipin Kumar, Bamshad...
Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications. A Dryad application combines computational “vertices” with communication ...
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrel...
In the last several years, large multi-dimensional databases have become common in a variety of applications such as data warehousing and scientific computing. Analysis and explor...
Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel feature bagging approach for detecting outliers in...