Reliable indexing of documents having seal instances can be achieved by recognizing seal information. This paper presents a novel approach for detecting and classifying such multi...
Large collections of documents are commonly created around a database, where a typical database schema may contain hundreds of tables and thousands of columns. We developed a syst...
Carlos Garcia-Alvarado, Carlos Ordonez, Zhibo Chen...
The paper presents a study on large-scale automatic extraction of acronyms and associated expansions from Web data and from the user interactions with this data through Web search...
Automated event extraction remains a very difficult challenge requiring information analysts to manually identify key events of interest within massive, dynamic data. Many techniq...
We present a method for picture detection in document page images, which can come from scanned or camera images, or rendered from electronic file formats. Our method uses OCR to s...