This paper describes SPMED, a system for robust and accurate linguistic parsing of medical documents which is used in several industrial products. The basic design criterion of th...
In this paper we propose a domainindependent text segmentation method, which consists of three components. Latent Dirichlet allocation (LDA) is employed to compute words semantic ...
We show that we can automatically classify semantically related phrases into 10 classes. Classification robustness is improved by training with multiple sources of evidence, inclu...
Ben Carterette, Rosie Jones, Wiley Greiner, Cory B...
This paper presents two methods for automatic detection of plagiarism in student essays, using Dutch text corpora to show their effectiveness. The first method is based on measur...
We study tree languages that can be defined in 2. These are tree languages definable by a first-order formula whose quantifier prefix is , and simultaneously by a first-order for...