Sciweavers

7495 search results - page 337 / 1499
» Intelligent Document Processing
Sort
View
ITCC
2003
IEEE
15 years 12 months ago
A Method for Calculating Term Similarity on Large Document Collections
We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is de...
Wolfgang W. Bein, Jeffrey S. Coombs, Kazem Taghva
DOCENG
2003
ACM
15 years 12 months ago
Creating reusable well-structured PDF as a sequence of component object graphic (COG) elements
Portable Document Format (PDF) is a page-oriented, graphically rich format based on PostScript semantics and it is also the format interpreted by the Adobe Acrobat viewers. Althou...
Steven R. Bagley, David F. Brailsford, Matthew R. ...
ICDAR
2003
IEEE
16 years 7 hour ago
A Character Recognizer for Turkish Language
This paper presents particularly a contextual post processing subsystem for a Turkish machine printed character recognition system. The contextual post processing subsystem is bas...
Sait Ulas Korkmaz, G. Kirçiçegi, Y. ...
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
16 years 7 months ago
Structured entity identification and document categorization: two tasks with one joint model
Traditionally, research in identifying structured entities in documents has proceeded independently of document categorization research. In this paper, we observe that these two t...
Indrajit Bhattacharya, Shantanu Godbole, Sachindra...
187
Voted
KDD
2007
ACM
186views Data Mining» more  KDD 2007»
16 years 7 months ago
Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus
We present a document routing and index partitioning scheme for scalable similarity-based search of documents in a large corpus. We consider the case when similarity-based search ...
Deepavali Bhagwat, Kave Eshghi, Pankaj Mehra