Sciweavers

2487 search results - page 436 / 498
» Automatic Model Selection by Modelling the Distribution of R...
Sort
View
SC
2009
ACM
16 years 1 months ago
Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems
MapReduce provides a parallel and scalable programming model for data-intensive business and scientific applications. MapReduce and its de facto open source project, called Hadoop...
Jianwu Wang, Daniel Crawl, Ilkay Altintas
KDD
2009
ACM
189views Data Mining» more  KDD 2009»
16 years 1 months ago
CoCo: coding cost for parameter-free outlier detection
How can we automatically spot all outstanding observations in a data set? This question arises in a large variety of applications, e.g. in economy, biology and medicine. Existing ...
Christian Böhm, Katrin Haegler, Nikola S. M&u...
WIKIS
2009
ACM
16 years 23 days ago
Measuring the wikisphere
Due to the inherent difficulty in obtaining experimental data from wikis, past quantitative wiki research has largely been focused on Wikipedia, limiting the degree that it can be...
Jeff Stuckman, James Purtilo
NIPS
2007
15 years 7 months ago
Mining Internet-Scale Software Repositories
Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated c...
Erik Linstead, Paul Rigor, Sushil Krishna Bajracha...
KDD
2009
ACM
269views Data Mining» more  KDD 2009»
16 years 6 months ago
Extracting discriminative concepts for domain adaptation in text mining
One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributi...
Bo Chen, Wai Lam, Ivor Tsang, Tak-Lam Wong