Morphologically complex terms composed from Greek or Latin elements are frequent in scientific and technical texts. Word forming units are thus relevant cues for the identificatio...
Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
Publication records are often found in the authors' personal home pages. If such a record is partitioned into a list of semantic fields of authors, title, date, etc., the uns...
Wei Zhang, Clement T. Yu, Neil R. Smalheiser, Vetl...
We address the problem of simplifying Portuguese texts at the sentence level treating it as a "translation task". We use the Statistical Machine Translation (SMT) framewo...
This paper explores the possibility of extending the functional genre analysis model to account for the genre characteristics of non-linear, multi-modal, webmediated documents. Th...