We consider the problem of finding highly correlated pairs in a large data set. That is, given a threshold not too small, we wish to report all the pairs of items (or binary attri...
Parallel coordinates technique has been widely used in information visualization applications and it has achieved great success in visualizing multivariate data and perceiving the...
Most database systems allow query processing over attributes that are derived at query runtime (e.g., user-defined functions and remote data calls to web services), making them e...
Justin J. Levandoski, Mohamed F. Mokbel, Mohamed E...
Social media such as blogs, Facebook, Flickr, etc., presents data in a network format rather than classical IID distribution. To address the interdependency among data instances, ...
Information-theoretic clustering aims to exploit information theoretic measures as the clustering criteria. A common practice on this topic is so-called INFO-K-means, which perfor...