The k-means algorithm is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice. Most of the theoretical work is restricted to the c...
The extraction of the relations of nested table headers to content cells is automated with a view to constructing narrow domain ontologies of semistructured web data. A taxonomy of...
Ramana C. Jandhyala, Mukkai S. Krishnamoorthy, Geo...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
This paper presents an information retrieval methodology which uses Formal Concept Analysis in conjunction with semantics to provide contextual answers to users’ queries. User f...
A new method for augmenting paper documents with electronic information is described that does not modify the format of the paper document in any way. Applicable to both commercia...
Jonathan J. Hull, Berna Erol, Jamey Graham, Qifa K...