Sciweavers

1652 search results - page 149 / 331
» A performance analysis of local synchronization
Sort
View
DAC
2003
ACM
16 years 7 months ago
Multilevel global placement with retiming
Multiple clock cycles are needed to cross the global interconnects for multi-gigahertz designs in nanometer technologies. For synchronous designs, this requires retiming and pipel...
Jason Cong, Xin Yuan
ASPLOS
2009
ACM
16 years 7 months ago
QR decomposition on GPUs
QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive sys...
Andrew Kerr, Dan Campbell, Mark Richards
PPOPP
2006
ACM
16 years 12 days ago
High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor
IP forwarding is one of the main bottlenecks in Internet backbone routers, as it requires performing the longest-prefix match at 10Gbps speed or higher. IPv6 forwarding further ex...
Xianghui Hu, Xinan Tang, Bei Hua
MICRO
2002
IEEE
108views Hardware» more  MICRO 2002»
15 years 11 months ago
Dynamic frequency and voltage control for a multiple clock domain microarchitecture
We describe the design, analysis, and performance of an on–line algorithm to dynamically control the frequency/voltage of a Multiple Clock Domain (MCD) microarchitecture. The MC...
Greg Semeraro, David H. Albonesi, Steve Dropsho, G...
IJHPCN
2006
116views more  IJHPCN 2006»
15 years 6 months ago
Implications of application usage characteristics for collective communication offload
Abstract-- The performance of collective communication operations is known to have a significant impact on the scalability of some applications. Indeed, the global, synchronous nat...
Ron Brightwell, Sue Goudy, Arun Rodrigues, Keith D...