Background: Feature selection is an approach to overcome the 'curse of dimensionality' in complex researches like disease classification using microarrays. Statistical m...
In this work we design algorithms for clustering relational columns into attributes, i.e., for identifying strong relationships between columns based on the common properties and ...
—Parallel netCDF (PnetCDF) is a popular library used in many scientific applications to store scientific datasets. It provides high-performance parallel I/O while maintaining ...
Kui Gao, Wei-keng Liao, Alok N. Choudhary, Robert ...
In many data mining applications, online labeling feedback is only available for examples which were predicted to belong to the positive class. Such applications include spam filt...
Main memory is a critical resource when processing longrunning queries over data streams with state intensive operators. In this work, we investigate state spill strategies that h...