The recent trend in the processor industry of packing multiple processor cores in a chip has increased the importance of automatic techniques for extracting thread level paralleli...
Easwaran Raman, Neil Vachharajani, Ram Rangan, Dav...
This paper describes an algorithm that takes a trace (i.e., a sequence of numbers or vectors of numbers) as input, and from that produces a sequence of loop nests that, when run, ...
Although the best processor design for executing a specific workload does depend on the characteristics of the workload, it can not be determined without factoring-in the effect o...
Many programs go through phases as they execute. Knowing where these phases begin and end can be beneficial. For example, adaptive architectures can exploit such information to lo...
Transactional memory (TM) is a scalable and concurrent way to build atomic sections. One aspect of TM that remains unclear is how side-effecting operations – that is, those whic...