Only a handful of fundamental mechanisms for synchronizing the access of concurrent threads to shared memory are widely implemented and used. These include locks, condition variab...
For many applications, branch mispredictions and cache misses limit a processor’s performance to a level well below its peak instruction throughput. A small fraction of static i...
Deregulated wholesale markets for bulk electricity supplies are likely to deviate from the perfectly competitive ideal in many areas where transmission losses, costs and capacity ...
Sparse LU factorization with partial pivoting is important for many scienti c applications and delivering high performance for this problem is di cult on distributed memory machin...
Many years of CMOS technology scaling have resulted in increased power densities and higher core temperatures. Power and temperature concerns are now considered to be a primary cha...
Daniel C. Vanderster, Amirali Baniasadi, Nikitas J...