In recent years, OLAP technologies have become one of the important applications in the database industry. In particular, the datacube operation proposed in [5] receives strong at...
Most research in text classification to date has used a “bag of words” representation in which each feature corresponds to a single word. This paper examines some alternative ...
PixED (from Pixel to Electronic Document) is aimed at converting document images into structured electronic documents which can be read by a machine for information retrieval. The...
In this paper, we will present a comprehensive voting approach, taking entire layouts obtained from commercial OCR devices as input. Such a layout comprises segments of three kind...
Abstract. A global Data warehouse (DW) integrates data from multiple distributed heterogeneous databases and other information sources. DW can be abstractly seen as a set of materi...