We introduce a novel data mining technique for the analysis of gene expression. Gene expression is the effective production of the protein that a gene encodes. We focus on the cha...
Aleksandar Icev, Carolina Ruiz, Elizabeth F. Ryder
We propose to use AdaBoost to efficiently learn classifiers over very large and possibly distributed data sets that cannot fit into main memory, as well as on-line learning wher...
Common techniques tackling the task of classification in data mining employ ansatz functions associated to training data points to fit the data as well as possible. Instead, the fe...
In earlier work we have introduced and explored a variety of different probabilistic models for the problem of answering selectivity queries posed to large sparse binary data set...
Large data resources are ubiquitous in science and business. For these domains, an intuitive view on the data is essential to fully exploit the hidden knowledge. Often, these data...