In this paper we argue that it is possible to couple the advantages of programming with the well-known abstraction of RPC with asynchronous programming models adequate for wide-ar...
Abstract. Loop distribution and loop fusion are two effective loop transformation techniques to optimize the execution of the programs in DSP applications. In this paper, we propo...
Meilin Liu, Qingfeng Zhuge, Zili Shao, Chun Xue, M...
Abstract. The register allocation in loops is generally performed after or during the software pipelining process. This is because doing a conventional register allocation at firs...
Abstract—Data movement within high performance environments can be a large bottleneck to the overall performance of programs. With the addition of continuous storage and usage of...
Irregular parallel algorithms pose a significant challenge for achieving high performance because of the difficulty predicting memory access patterns or execution paths. Within an...