Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
Business-to-Business (B2B) technologies pre-date the Web. They have existed for at least as long as the Internet. B2B applications were among the first to take advantage of advance...
Abstract This paper describes the design, implementation, and performance characteristics of a commercial XQuery processing engine, the BEA streaming XQuery processor. This XQuery ...
Daniela Florescu, Chris Hillery, Donald Kossmann, ...
Estimating the cardinality (i.e. number of distinct elements) of an arbitrary set expression defined over multiple distributed streams is one of the most fundamental queries of in...
Record matching is the task of identifying records that match the same real world entity. This is a problem of great significance for a variety of business intelligence applicatio...