We present a document-specific OCR system and apply it to a corpus of faxed business letters. Unsupervised classification of the segmented character bitmaps on each page, using a ...
Non-negative Matrix Factorization (NMF, [5]) and Probabilistic Latent Semantic Analysis (PLSA, [4]) have been successfully applied to a number of text analysis tasks such as docum...
This paper presents a new approach to identifying concepts expressed in a collection of email messages, and organizing them into an ontology or taxonomy for browsing. It incorpora...
— A Cyber-Physical System (CPS) integrates physical devices (e.g., sensors, cameras) with cyber (or informational) components to form a situation-integrated analytical system tha...
Lu An Tang, Xiao Yu, Sangkyum Kim, Jiawei Han, Wen...
Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space. Despite significant research, these methods have re...