To support dynamic address translation in today's microprocessors, the first-level cache is accessed in parallel with a translation lookaside buffer (TLB). However, this curre...
Interconnection networks have been deployed as the communication fabric in a wide range of parallel computer systems. With recent technological trends allowing growing quantities ...
In this work, we propose a new FPGA design flow that combines the CUDA programming model from Nvidia with the state of the art high-level synthesis tool AutoPilot from AutoESL, to...
—Motion Estimation (ME) is the most computationally intensive part of video compression and video enhancement systems. One bit transform (1BT) based ME algorithms have low comput...
In this paper we analyze the I/O access patterns of a widely-used biological sequence search tool and implement two variations that employ parallel-I/O for data access based on PV...
Yifeng Zhu, Hong Jiang, Xiao Qin, David R. Swanson