Complex documents stored in a flat or partially marked up file format require layout sensitive preprocessing before any natural language processing can be carried out on their tex...
Automatic transliteration problem is to transcribe foreign words in one's own alphabet. Machine generated transliteration can be useful in various applications such as indexi...
This paper presents a method for automatically generating an association thesaurus from a text corpus, and demonstrates its application to information retrieval. The thesaurus gen...
We argue that some of the computational complexity associated with estimation of stochastic attributevalue grammars can be reduced by training upon an informative subset of the fu...
A deterministic finite state transducer is a fast device for analyzing strings. It takes O(n) time to analyze a string of length n. In this paper, an application of this technique...