The National Taiwan University Library has built a digital library of historical documents about Taiwan. The content is unique in that it covers about 80% of all primary Chinese hi...
Automatic classification of documents is an important area of research with many applications in the fields of document searching, forensics and others. Methods to perform class...
Abstract. This paper suggests a novel representation for documents that is intended to improve precision. This representation is generated by combining two central techniques: Rand...
This paper presents an empirical study for improving the performance of text chunking. We focus on two issues: the problem of selecting feature spaces, and the problem of alleviat...
Commercial OCR packages work best with highquality scanned images. They often produce poor results when the image is degraded, either because the original itself was poor quality,...