Typographic and visual information is an integral part of textual documents. Most information extraction systems ignore most of this visual information, processing the text as a l...
User queries on extensible markup language (XML) documents are typically expressed as regular path expressions. A variety of indexing techniques for efficiently retrieving the re...
Many document images are rich in color and have complex background. To detect text from them, a standard approach utilizes both color and binary information. This often leads to t...
In this paper we extend the state-of-the-art in utilizing background knowledge for supervised classification by exploiting the semantic relationships between terms explicated in O...
Meenakshi Nagarajan, Amit P. Sheth, Marcos Kawazoe...
Microsoft is producing high-quality documentation for Windows client-server and server-server protocols. Our group in the Windows organization is responsible for verifying the doc...
Wolfgang Grieskamp, Nicolas Kicillof, Dave MacDona...