Fabric is a new system and language for building secure distributed information systems. It is a decentralized system that allows heterogeneous network nodes to securely share bot...
Jed Liu, Michael D. George, K. Vikram, Xin Qi, Luc...
The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has...
Srinivas Chellappa, Franz Franchetti, Markus P&uum...
Loop distribution is one of the most useful techniques to reduce the execution time of parallel applications. Traditionally, loop scheduling algorithms are implemented based on pa...
Cycles per Instruction (CPI) stacks break down processor execution time into a baseline CPI plus a number of miss event CPI components. CPI breakdowns can be very helpful in gaini...
This paper presents program transformations directed toward improving communication-computation overlap in parallel programs that use MPI’s collective operations. Our transforma...
Anthony Danalis, Ki-Yong Kim, Lori L. Pollock, D. ...