This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, ...
We propose a novel a framework for deriving approximations for intractable probabilistic models. This framework is based on a free energy (negative log marginal likelihood) and ca...
We develop methods for analyzing and constructing combined modulation/error-correctiong codes (ECC codes), in particular codes that employ some form of reversed concatenation and w...
Jorge Campello de Souza, Brian H. Marcus, Richard ...
Multi-label problems arise in various domains such as multitopic document categorization and protein function prediction. One natural way to deal with such problems is to construc...
As a fundamental data mining task, frequent pattern mining has widespread applications in many different domains. Research in frequent pattern mining has so far mostly focused on ...
Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, Che...