In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
Information integration and retrieval are useful tasks in many information systems. In these systems, it is far from an easy task to directly integrate information from natural lan...
Semantic heterogeneity of information is a major barrier of information and system interoperability. Defining ontology of data and mapping ontologies among heterogeneous informati...
Supporting top-k queries over distributed collections of schemaless XML data poses two challenges. While XML supports expressive query languages such as XPath and XQuery, these la...
The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured da...
Sudarshan S. Chawathe, Hector Garcia-Molina, Joach...