Background: We consider effects of dependence among variables of high-dimensional data in multiple hypothesis testing problems, in particular the False Discovery Rate (FDR) contro...
Learning Bayesian network structure from large-scale data sets, without any expertspecified ordering of variables, remains a difficult problem. We propose systematic improvements ...
Dimension attributes in data warehouses are typically hierarchical (e.g., geographic locations in sales data, URLs in Web traffic logs). OLAP tools are used to summarize the measu...
Abstract-- Answering approximate queries on string collections is important in applications such as data cleaning, query relaxation, and spell checking, where inconsistencies and e...
The discovery and construction of inherent regions in large spatial datasets is an important task for many research domains such as climate zoning, eco-region analysis, public heal...