In this paper we introduce the concept and method for adaptively tuning the model complexity in an online manner as more examples become available. Challenging classification pro...
We propose a new algorithm for dimensionality reduction and unsupervised text classification. We use mixture models as underlying process of generating corpus and utilize a novel,...
MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-...
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, ...
In this paper we report on the acquisition and content of a new database intended for developing audio-visual speech recognition systems. This database supports a speaker dependen...
Genomes to Life (GTL), the U.S. Department of Energy Office of Science’s systems biology program, focuses on environmental microbiology. Over the next 10 to 20 years, GTL’s ke...
Marvin Frazier, David Thomassen, Aristides Patrino...