Embedded computing architectures can be designed to meet a variety of application specific requirements. However, optimized hardware can require compiler support to realize the po...
The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM’s Blue Gene/L which can acc...
We consider unicast-based pipelined broadcast schemes for clusters connected by multiple Ethernet switches. By splitting a large broadcast message into segments and broadcasting t...
The development of Field Programmable Gate Arrays (FPGAs) had tremendous improvements in the last few years. They were extended from simple logic circuits to complex Systems-on-Ch...
This paper presents the alternatives available to support threadprivate data in OpenMP and evaluates them. We show how current compilation systems rely on custom techniques for im...