It is widely recognized that the proliferation of annotation schemes runs counter to the need to re-use language resources, and that standards for linguistic annotation are becomi...
In a 12-month project we have developed a new, register-diverse, 55-million-word bilingual corpus--the New Corpus for Ireland (NCI)--to support the creation of a new English-to-Iri...
In this paper, a supervised learning system of word sense disambiguation is presented. It is based on conditional maximum entropy models. This system acquires the linguistic knowl...
Letter-substitution ciphers encode a document from a known or hypothesized language into an unknown writing system or an unknown encoding of a known writing system. It is a proble...
Despite its substantial coverage, NomBank does not account for all withinsentence arguments and ignores extrasentential arguments altogether. These arguments, which we call implic...