Matrix computation algorithms often exhibit dependencies between neighboring elements inside loop nests such that the frontier between computed elements and those to be computed w...
As cluster systems become increasingly popular, more and more parallel applications require need not only computing power but also significant I/O performance. However, the I/O s...
Abstract. The development of scalable parallel database systems requires the design of efficient algorithms for the join operation which is the most frequent and expensive operatio...
As the number of cores per machine increases, memory architectures are being redesigned to avoid bus contention and sustain higher throughput needs. The emergence of Non-Uniform M...
— While multicast has been studied extensively in many domains such as content streaming and file sharing, there is little research applying it to synchronous collaborations invo...