We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-like...
In this paper we discuss the possible application of new concepts in web content extraction: utility assessment, utility annealing, and dynamic aggregated document generation. Aft...
Understanding the source, data, and documentation files associated with legacy systems in preparation for maintenance or reengineering is an increasingly important problem for man...
The issue of graph recognition has been not always investigated until today though the subjects on the document image understanding are very interest and have proposed many method...
Data-centric business applications comprise an important class of distributed systems that includes on-line stores, document management systems, and patient portals. However, their...