Data models and encoding formats for syntactically annotated text corpora need to deal with syntactic ambiguity; underspecified representations are particularly well suited for th...
In this paper, we describe our work on building a parallel treebank for a less studied and typologically dissimilar language pair, namely Swedish and Turkish. The treebank is a ba...
Spectral clustering is a powerful clustering method for document data set. However, spectral clustering needs to solve an eigenvalue problem of the matrix converted from the simil...
This paper describes a study of the levels at which different rhetorical relations occur in rhetorical structure trees. In a previous empirical study (Williams and Reiter, 2003) o...
We introduce the corpus of United States Congressional bills from 1947 to 1998 for use by language research communities. The U.S. Policy Agenda Legislation Corpus Volume 1 (USPALC...