This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Biomolecular simulations produce more output data than can be managed effectively by traditional computing systems. Researchers need distributed systems that allow the pooling of...
Justin M. Wozniak, Paul Brenner, Douglas Thain, Aa...
This paper presents an improvement of the classical Non-negative Matrix Factorization (NMF) approach, for dealing with local representations of image objects. NMF, when applied to...
This paper presents a general method for segmenting a vector valued sequence into an unknown number of subsequences where all data points from a subsequence can be represented wit...
We present the Deep Store archival storage architecture, a large-scale storage system that stores immutable data efficiently and reliably for long periods of time. Archived data i...
Lawrence You, Kristal T. Pollack, Darrell D. E. Lo...