We investigate how random projection can best be used for clustering high dimensional data. Random projection has been shown to have promising theoretical properties. In practice,...
We examine the set covering machine when it uses data-dependent half-spaces for its set of features and bound its generalization error in terms of the number of training errors an...
Mario Marchand, Mohak Shah, John Shawe-Taylor, Mar...
We present a novel approach to embedding data represented by a network into a lowdimensional Euclidean space. Unlike existing methods, the proposed method attempts to minimize an ...
Previous discretization techniques have discretized numeric attributes into disjoint intervals. We argue that this is neither necessary nor appropriate for naive-Bayes classifiers...
This work is concerned with the estimation of a classifier's accuracy. We first review some existing methods for error estimation, focusing on cross-validation and bootstrap,...