We compare machine learning approaches for sentence length reduction for automatic generation of subtitles for deaf and hearing-impaired people with a method which relies on hand-...
Erik F. Tjong Kim Sang, Walter Daelemans, Anja H&o...
Compounded words are a challenge for NLP applications such as machine translation (MT). We introduce methods to learn splitting rules from monolingual and parallel corpora. We eva...
We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ...
We derive a generalization bound for multiclassification schemes based on grid clustering in categorical parameter product spaces. Grid clustering partitions the parameter space i...
In this paper we focus on the adaptation of boosting to grammatical inference. We aim at improving the performances of state merging algorithms in the presence of noisy data by us...
Jean-Christophe Janodet, Richard Nock, Marc Sebban...