The regular occurrence of disfluencies is a distinguishing characteristic of spontaneous speech. Detecting and removing such disfluencies can substantially improve the usefulness ...
Abstract. We show several PAC-style concentration bounds for learning unigrams language model. One interesting quantity is the probability of all words appearing exactly k times in...
The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machine-harvestable object-attribute-value triples in Wikipedia inf...
Determining the semantic role of sentence constituents is a key task in determining sentence meanings lying behind a veneer of variant syntactic expression. We present a model of n...
Cynthia A. Thompson, Roger Levy, Christopher D. Ma...
This paper presents a semi-supervised training method for linear-chain conditional random fields that makes use of labeled features rather than labeled instances. This is accompli...