In this paper we present a proposal to extend WordNet-like lexical databases by adding phrasets, i.e. sets of free combinations of words which are recurrently used to express a co...
We propose a new method for detecting errors in “gold-standard” part-ofspeech annotation. The approach locates errors with high precision based on n-grams occurring in the cor...
We present a neural-network-based statistical parser, trained and tested on the Penn Treebank. The neural network is used to estimate the parameters of a generative model of left-...
The paper describes ongoing work on the evaluation of methods for extracting collocation candidates from large text corpora. Our research is based on a German treebank corpus used...
In this paper we introduce a dynamic programming algorithm to perform linear text segmentation by global minimization of a segmentation cost function which consists of: (a) within...