The disparity between microprocessor clock frequencies and memory latency is a primary reason why many demanding applications run well below peak achievable performance. Software c...
Joseph Gebis, Leonid Oliker, John Shalf, Samuel Wi...
In the embedded domain, custom hardware in the form of ASICs is often used to implement critical parts of applications when performance and energy efficiency goals cannot be met ...
Kevin Fan, Hyunchul Park, Manjunath Kudlur, Scott ...
The map-reduce model requires users to express their problem in terms of a map function that processes single records in a stream, and a reduce function that merges all mapped out...
Jackson H. C. Yeung, C. C. Tsang, Kuen Hung Tsoi, ...
This paper proposes a low power VLIW processor generation method by automatically extracting non-redundant activation conditions of pipeline registers for clock gating. It is impo...
We present a co-synthesis approach that accelerates reactive software processing by moving the calculation of complex expressions into external combinational hardware. The startin...