Entity matching (EM) is the task of identifying records that refer to the same real-world entity from different data sources. While EM is widely used in data integration and data...
We focus on the problem of efficient learning of dependency trees. Once grown, they can be used as a special case of a Bayesian network, for PDF approximation, and for many other u...
We formalize the problem of maintaining time-decaying aggregates and statistics of a data stream: the relative contribution of each data item to the aggregate is scaled down by a ...
In this paper, we give a simple scheme for identifying approximate frequent items over a sliding window of size n. Our scheme is deterministic and does not make any assumption on ...
Performance prediction has gained increasing attention in the IR field since the half of the past decade and has become an established research topic in the field. The present work...