Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can p...
Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyv...
Search engine logs are an emerging new type of data that offers interesting opportunities for data mining. Existing work on mining such data has mostly attempted to discover knowl...
Proactive learning is a generalization of active learning designed to relax unrealistic assumptions and thereby reach practical applications. Active learning seeks to select the m...
We introduce a multi-stage ensemble framework, ErrorDriven Generalist+Expert or Edge, for improved classification on large-scale text categorization problems. Edge first trains a ...
Structured documents contain elements defined by the author(s) and annotations assigned by other people or processes. Structured documents pose challenges for probabilistic retrie...