Sciweavers

5046 search results - page 128 / 1010
» Non-redundant data clustering
Sort
View
EDBT
2012
ACM
306views Database» more  EDBT 2012»
13 years 9 months ago
Clydesdale: structured data processing on MapReduce
MapReduce has emerged as a promising architecture for large scale data analytics on commodity clusters. The rapid adoption of Hive, a SQL-like data processing language on Hadoop (...
Tim Kaldewey, Eugene J. Shekita, Sandeep Tata
KDD
2003
ACM
191views Data Mining» more  KDD 2003»
16 years 6 months ago
Assessment and pruning of hierarchical model based clustering
The goal of clustering is to identify distinct groups in a dataset. The basic idea of model-based clustering is to approximate the data density by a mixture model, typically a mix...
Jeremy Tantrum, Alejandro Murua, Werner Stuetzle
BMCBI
2011
15 years 1 months ago
The dChip survival analysis module for microarray data
Background: Genome-wide expression signatures are emerging as potential marker for overall survival and disease recurrence risk as evidenced by recent commercialization of gene ex...
Samir B. Amin, Parantu K. Shah, Aimin Yan, Sophia ...
CCGRID
2011
IEEE
14 years 10 months ago
A Segment-Level Adaptive Data Layout Scheme for Improved Load Balance in Parallel File Systems
Abstract—Parallel file systems are designed to mask the everincreasing gap between CPU and disk speeds via parallel I/O processing. While they have become an indispensable compo...
Huaiming Song, Yanlong Yin, Xian-He Sun, Rajeev Th...
CLUSTER
2008
IEEE
16 years 29 days ago
Enabling lock-free concurrent fine-grain access to massive distributed data: Application to supernovae detection
—We consider the problem of efficiently managing massive data in a large-scale distributed environment. We consider data strings of size in the order of Terabytes, shared and ac...
Bogdan Nicolae, Gabriel Antoniu, Luc Bougé