Science is increasingly driven by data collected automatically from arrays of inexpensive sensors. The collected data volumes require a different approach from the scientists'...
Stuart Ozer, Jim Gray, Alexander S. Szalay, Andrea...
We present an adaptive distributed query-sampling framework that is quality-conscious for extracting high-quality text database samples. The framework divides the query-based samp...
Co-occurrence data is quite common in many real applications. Latent Semantic Analysis (LSA) has been successfully used to identify semantic relations in such data. However, LSA c...
For a given set of search engines, a search engine is redundant if its searchable contents can be found from other search engines in this set. In this paper, we propose a method t...
Geoscience studies produce data from various observations, experiments, and simulations at an enormous rate. With proliferation of applications and data formats, the geoscience re...