We describe the construction of the CODA corpus, a parallel corpus of monologues and expository dialogues. The dialogue part of the corpus consists of expository, i.e., informatio...
Confusion networks are a simple representation of multiple speech recognition or translation hypotheses in a machine translation system. A typical operation on a confusion network...
This paper presents an efficient algorithm for retrieving from a database of trees, all trees that match a given query tree appro,imately, that is, within a certain error toleranc...
Lexicalized reordering models play a crucial role in phrase-based translation systems. They are usually learned from the word-aligned bilingual corpus by examining the reordering ...
Overlapping and multi-version techniques are two popular frameworks that transform an ephemeral index into a multiple logical-tree structure in order to support versioning databas...