Abstract. In this paper we present a coarse-grained parallel algorithm, CONQUEST, for constructing boundederror summaries of high-dimensional binary attributed data in a distribute...
Aim of this paper is to propose a methodology for the definition of an instruction-level energy estimation framework for VLIW (Very Long Instruction Word) processors. The power mo...
Andrea Bona, Mariagiovanna Sami, Donatella Sciuto,...
Thread-Level Speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. ...
J. Gregory Steffan, Christopher B. Colohan, Antoni...
Tiling, a key transformation for optimizing programs, has been widely studied in literature. Parameterized tiled code is important for auto-tuning systems since they often execute...
Muthu Manikandan Baskaran, Albert Hartono, Sanket ...
An upgrade from dual-core to quad-core AMD processor on the Cray XT system at the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility (LCF) has resulted in significa...
Sadaf R. Alam, Richard F. Barrett, Heike Jagode, J...