ABSTRACT. In the framework of the LegDoc project at Xerox Research Centre Europe, we are developing components for the semantic annotation of semi-structured documents. While certa...
In a series of publications, we have employed ontological theories and principles used to evaluate and improve the quality of conceptual modeling grammars and models. In this artic...
One of the most challenging issues in managing the large and diverse data available on the World Wide Web is the design of interactive systems to organize and represent information...
We created a simple gold standard for English-Hungarian NP-level alignment, Orwell's 1984 by manually verifying the automatically generated NP chunking and manually aligning ...
The Enron Email Corpus provides "Real World" text in the business email domain, which is a target domain for many speech and language applications. We present a section ...