Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memo...
Byunghyun Jang, Perhaad Mistry, Dana Schaa, Rodrig...
We present a performance model-driven framework for automated performance tuning (autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics processing...
Irregular algorithms are organized around pointer-based data structures such as graphs and trees, and they are ubiquitous in applications. Recent work by the Galois project has pr...
Web-service-related techniques have become popular to improve system integration and interaction. In distributed and dynamic environment, web services’ availability has been reg...
Lingshuang Shao, Junfeng Zhao, Tao Xie, Lu Zhang, ...
Cache-oblivious algorithms have the advantage of achieving good sequential cache complexity across all levels of a multi-level cache hierarchy, regardless of the specifics (cache...
Guy E. Blelloch, Phillip B. Gibbons, Harsha Vardha...