We present a directed Markov random field (MRF) model that combines n-gram models, probabilistic context free grammars (PCFGs) and probabilistic latent semantic analysis (PLSA) fo...
Shaojun Wang, Shaomin Wang, Russell Greiner, Dale ...
We consider the the problem of tracking heavy hitters and quantiles in the distributed streaming model. The heavy hitters and quantiles are two important statistics for characteri...
Measuring distance or some other form of proximity between objects is a standard data mining tool. Connection subgraphs were recently proposed as a way to demonstrate proximity be...
In this paper, we discuss a problem of finding risk patterns in medical data. We define risk patterns by a statistical metric, relative risk, which has been widely used in epidemi...
Jiuyong Li, Ada Wai-Chee Fu, Hongxing He, Jie Chen...
Classification trees are widely used in the machine learning and data mining communities for modeling propositional data. Recent work has extended this basic paradigm to probabili...
Jennifer Neville, David Jensen, Lisa Friedland, Mi...