To make efficient use of CMPs with tens to hundreds of cores, it is often necessary to exploit fine-grain parallelism. However, managing tasks of a few thousand instructions is ...
Daniel Sanchez, Richard M. Yoo, Christos Kozyrakis
In the area of scientific visualization, input data sets are often very large. In visualization of Computational Fluid Dynamics (CFD) in particular, input data sets today can surp...
— In this paper, we present Parade, a novel and flexible parallel architecture for the deinterleaving of combined pulsetrains. This is a commonly performed task in various areas ...
As the disparity between processor and main memory performance grows, the number of execution cycles spent waiting for memory accesses to complete also increases. As a result, lat...
Teresa L. Johnson, Matthew C. Merten, Wen-mei W. H...
Good spatial locality alleviates both the latency and bandwidth problem of memory by boosting the effect of prefetching and improving the utilization of cache. However, convention...
Xiaoming Gu, Ian Christopher, Tongxin Bai, Chengli...