To address semantic ambiguities in coreference resolution, we use Web n-gram features that capture a range of world knowledge in a diffuse but robust way. Specifically, we exploi...
An organization's data records are often noisy because of transcription errors, incomplete information, lack of standard formats for textual data or combinations thereof. A f...
Luis Gravano, Panagiotis G. Ipeirotis, Nick Koudas...
An essential goal of structuring lecture videos captured in live presentation is to provide a synchronized view of video clips and electronic slides. This paper presents an automa...
This paper describes a document image analysis system using multiple agents working on a pyramid structure to separate text from graphics in the image. Text strings appear as diff...
Chew Lim Tan, Bo Yuan, Weihua Huang, Qian Wang, Zh...
The Teko corpus composing model offers a decentralized, dynamic way of collecting high-quality text corpora for linguistic research. The resulting corpus consists of independent t...