Abstract--On NVIDIA's many-core GPUs, there is no synchronization function among parallel thread blocks. When finegranularity of data communication and synchronization is requ...
Jianmin Chen, Zhuo Huang, Feiqi Su, Jih-Kwon Peir,...
We present Grouped Distributed Queues (GDQ), the first proportional share scheduler for multiprocessor systems that scales well with a large number of processors and processes. G...
Background: Chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq) is used to probe chromatin dynamics. Although there are many algorith...
Real world computer vision systems highly depend on reliable, robust retrieval of motion cues to make accurate decisions about their surroundings. In this paper, we present a simp...
To understand the performance of modern Java systems one must observe execution in the context of specific architectures. It is also important that we make these observations usi...
J. Eliot B. Moss, Charles C. Weems, Timothy Richar...