In this paper we review the history of systems for managing “Big Data” as well as today’s activities and architectures from the (perhaps biased) perspective of three “data...
Recent work both in the relational and the XML world have shown that the efficacy and efficiency of duplicate detection is enhanced by regarding relationships between entities. Ho...
We present a system that enables flexible and efficient phrase matching in XML documents. Since XML allows structured and unstructured information to be interleaved, phrase matchi...
XML documents have recently become ubiquitous because of their varied applicability in a number of applications. Classification is an important problem in the data mining domain, ...
Duplicate detection is the problem of detecting different entries in a data source representing the same real-world entity. While research abounds in the realm of duplicate detect...