The de Bruijn digraph Bd;D is usually defined by words of size D on an alphabet of cardinality d, through a cyclic left shift permutation on the words, after which the rightmos...
General-purpose ontologies (e.g. WordNet) are convenient, but they are not always scientifically valid. We draw on techniques from semantic class learning to improve the scientific...
We study dimensionality reduction or feature selection in text document categorization problem. We focus on the first step in building text categorization systems, that is the cho...
This paper explores the large-scale acquisition of sense-tagged examples for Word Sense Disambiguation (WSD). We have applied the "WordNet monosemous relatives" method t...
We present a simple architecture for parsing transcribed speech in which an edited-word detector first removes such words from the sentence string, and then a standard statistical...