As the rapid growth of PDF document in digital libraries, recognizing the document structure and detecting specific document components are useful for document storage, classifica...
Latent Semantic Indexing (LSI) has been validated to be effective on many small scale text collections. However, little evidence has shown its effectiveness on unsampled large sca...
A new trend in the field of pattern matching is to design indexing data structures which take space very close to that required by the indexed text (in entropy-compressed form) an...
Wing-Kai Hon, Rahul Shah, Sharma V. Thankachan, Je...
Nowadays, information is primarily searched on the WWW. From a user perspective, the readability is an important criterion for measuring the accessibility and thereby the quality ...
Abstract. Content-oriented XML retrieval systems support access to XML repositories by retrieving, in response to user queries, XML document components (XML elements) instead of wh...