XML Schema has emerged as a promising data model that unites structured and unstructured content. The Oracle database has led the commercial database community in integrating supp...
We consider the problem of building a P2P-based search engine for massive document collections. We describe a prototype system called ODISSEA (Open DIStributed Search Engine Archi...
A large fraction of the useful web comprises of specification documents that largely consist of hattribute name, numeric valuei pairs embedded in text. Examples include product in...
The research in information extraction (IE) regards the generation of wrappers that can extract particular information from semistructured Web documents. Similar to compiler gener...
We present the design and implementation of XEBRA system. XEBRA is an integrated programming environment for XML processing and browsing on which users can build their own XML pro...