Manual reconstruction of ripped-up documents can be a very difficult and time-consuming task. This paper discusses a semi-automatic toolset that can be used for reconstructing rip...
In this paper, we address the problem of database selection for XML document collections, that is, given a set of collections and a user query, how to rank the collections based o...
— We present a general approach for the hierarchical segmentation and labeling of document layout structures. This approach models document layout as a grammar and performs a glo...
PixED (from Pixel to Electronic Document) is aimed at converting document images into structured electronic documents which can be read by a machine for information retrieval. The...
We review the literature on automatic document formatting with an emphasis on recent work in the field. One common way to frame document formatting is as a constrained optimizatio...