In earlier work we have introduced and explored a variety of different probabilistic models for the problem of answering selectivity queries posed to large sparse binary data set...
Naïve Bayes (NB) classifier has long been considered a core methodology in text classification mainly due to its simplicity and computational efficiency. There is an increasing n...
K-Means clustering is widely used in information retrieval and data mining. Distributed K-Means variants have already been proposed, but none of the past algorithms scales to large...
Odysseas Papapetrou, Wolf Siberski, Fabian Leitrit...
Background: A large volume of data and information about genes and gene products has been stored in various molecular biology databases. A major challenge for knowledge discovery ...
Increased availability of large repositories of chemical compounds has created new challenges and opportunities for the application of data-mining and indexing techniques to probl...