Sciweavers

2155 search results - page 154 / 431
» The EM-X Parallel Computer: Architecture and Basic Performan...
Sort
View
IPPS
2008
IEEE
16 years 26 days ago
Faster matrix-vector multiplication on GeForce 8800GTX
Recently a GPU has acquired programmability to perform general purpose computation fast by running ten thousands of threads concurrently. This paper presents a new algorithm for d...
N. Fujimoto
LCN
2005
IEEE
16 years 20 hour ago
On Reorder Density and its Application to Characterization of Packet Reordering
A formal approach for characterizing, evaluating and modeling packet reordering is presented. Reordering is, a phenomenon that is likely to become increasingly common on Internet,...
Nischal M. Piratla, Anura P. Jayasumana, Tarun Ban...
ICCAD
2005
IEEE
131views Hardware» more  ICCAD 2005»
16 years 3 months ago
Code restructuring for improving cache performance of MPSoCs
— One of the critical goals in code optimization for MPSoC architectures is to minimize the number of off-chip memory accesses. This is because such accesses can be extremely cos...
Guilin Chen, Mahmut T. Kandemir
HOTI
2005
IEEE
16 years 1 days ago
Zero Copy Sockets Direct Protocol over InfiniBand - Preliminary Implementation and Performance Analysis
Sockets Direct Protocol (SDP) is a byte-stream transport protocol implementing the TCP SOCK_STREAM semantics utilizing transport offloading capabilities of the InfiniBand fabric. ...
Dror Goldenberg, Michael Kagan, Ran Ravid, Michael...
IPPS
2002
IEEE
15 years 11 months ago
JMPI: Implementing the Message Passing Standard in Java
The Message Passing Interface (MPI) standard provides a uniform Application Programmers Interface (API) that abstracts the underlying hardware from the parallel applications. Rece...
Steven Morin, Israel Koren, C. Mani Krishna