An algorithm is presented that automatically matches images of presentation slides to the symbolic source file (e.g., PowerPointTM or AcrobatTM ) from which they were generated. T...
Electronic publishing of material digitized using imaging and OCR calls for a special delivery format capable of reconstructing original documents in a well-usable electronic form...
Understanding query intent is essential to generating appropriate rankings for users. Existing methods have provided customized rankings to answer queries with different intent. W...
We describe ongoing research on segmenting and labeling HTML medical journal articles. In contrast to existing approaches in which HTML tags usually serve as strong indicators, we...
We consider fast two-sided error-tolerant search that is robust against errors both on the query side (type alogrithm, find documents with algorithm) as well as on the document si...