— One of the most prominent data quality problems is the existence of duplicate records. Current data cleaning systems usually produce one clean instance (repair) of the input da...
George Beskales, Mohamed A. Soliman, Ihab F. Ilyas...
Detecting and eliminating fuzzy duplicates is a critical data cleaning task that is required by many applications. Fuzzy duplicates are multiple seemingly distinct tuples which re...
When a private relational table is published using views, secrecy or privacy may be violated. This paper uses a formally-defined notion of k-anonymity to measure disclosure by vi...
A model for the co-evolution of patterns and classifiers is presented. The CellNet system for generating binary classifiers is used as a base for experimentation. The CellNet syste...
Taras Kowaliw, Nawwaf N. Kharma, Chris Jensen, Hus...
Conducting data mining on logs of web servers involves the determination of frequently occurring access sequences. We examine the problem of finding traversal patterns from web lo...