We give an overview of the algorithms and implementations in the high-performance MPI libraries MPI/SX and MPI/ES of some of the most important collective operations of MPI (the M...
It is widely known that parallel operation execution in multiprocessor systems generates a respective increase in memory accesses. Since the memory and bus subsystems provide a li...
Grigoris Dimitroulakos, Michalis D. Galanis, Costa...
Despite voluminous previous research on adaptive compression, we found significant challenges when attempting to fully utilize both network bandwidth and CPU. We describe the Fine...
Consider a heterogeneous cluster system, consisting of processors with varying processing capabilities and network links with varying bandwidths. Given a DAG application to be sch...
— Current Systems-On-Chip execute applications that demand extensive parallel processing. Networks-On-Chip (NoC) provide a structured way of realizing interconnections on silicon...
Nicolas Genko, David Atienza, Giovanni De Micheli,...