Background: High throughput methods of the genome era produce vast amounts of data in the form of gene lists. These lists are large and difficult to interpret without advanced com...
We present the IBM systems for the Rich Transcription 2007 (RT07) speaker diarization evaluation task on lecture meeting data. We first overview our baseline system that was devel...
In this paper we develop an efficient implementation for a k-means clustering algorithm. The novel feature of our algorithm is that it uses coresets to speed up the algorithm. A ...
Peer-to-peer (p2p) systems offer an efficient means of data sharing among a dynamically changing set of a large number of autonomous nodes. Each node in a p2p system is connected...
Many parallel applications from scientific computing use MPI collective communication operations to collect or distribute data. Since the execution times of these communication op...