In this paper, we propose a tree-structured multiclass classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured cla...
Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not en...
We present a model that improves entity entity link modeling in a mixed membership stochastic block model, by jointly modeling links with text about the entities that are linked i...
Random sampling is one of the most fundamental data management tools available. However, most current research involving sampling considers the problem of how to use a sample, and...
Today’s data integration systems must be flexible enough to support the typical iterative and incremental process of integration, and may need to scale to hundreds of data sour...