Deduplication is a key operation in integrating data from multiple sources. The main challenge in this task is designing a function that can resolve when a pair of records refer t...
In the last several years, large OLAP databases have become common in a variety of applications such as corporate data warehouses and scientific computing. To support interactive ...
A pervasive problem in large relational databases is identity uncertainty which occurs when multiple entries in a database refer to the same underlying entity in the world. Relati...
Abstract. We present a method for applying machine learning algorithms to the automatic classification of astronomy star surveys using time series of star brightness. Currently su...
Gabriel Wachman, Roni Khardon, Pavlos Protopapas, ...
Abstract. We address the problem of joint feature selection in multiple related classification or regression tasks. When doing feature selection with multiple tasks, usually one c...
Paramveer S. Dhillon, Brian Tomasik, Dean P. Foste...