Blocking is a technique commonly used in manual statistical analysis to account for confounding variables. However, blocking is not currently used in automated learning algorithms...
A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dim...
Abstract. Most of the work in machine learning assume that examples are generated at random according to some stationary probability distribution. In this work we study the problem...
Abstract Clustering Stability methods are a family of widely used model selection techniques for data clustering. Their unifying theme is that an appropriate model should result in...
We discuss information retrieval methods that aim at serving a diverse stream of user queries such as those submitted to commercial search engines. We propose methods that emphasi...
Hongyuan Zha, Zhaohui Zheng, Haoying Fu, Gordon Su...