For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...
Background: Much of the public access cancer microarray data is asymmetric, belonging to datasets containing no samples from normal tissue. Asymmetric data cannot be used in stand...
Given a data set, a dynamical procedure is applied to the data points in order to shrink and separate, possibly overlapping clusters. Namely, Newton’s equations of motion are em...
In this paper we consider the problem of answering queries using views, which is important for data integration, query optimization, and data warehouses. We consider its simplest ...
Abstract. Scientists’ ability to generate and collect massive-scale datasets is increasing. As a result, constraints in data analysis capability rather than limitations in the av...
YongChul Kwon, Dylan Nunley, Jeffrey P. Gardner, M...