Clustering time-series data poses problems, which do not exist in traditional clustering in Euclidean space. Specifically, cluster prototype needs to be calculated, where common s...
In many text retrieval tasks, it is highly desirable to obtain a "similarity profile" of the document collection for a given query. We propose sampling-based techniques ...
We present a family of measures of proximity of an arbitrary node in a directed graph to a pre-specified subset of nodes, called the anchor. Our measures are based on three differ...
Amruta Joshi, Ravi Kumar, Benjamin Reed, Andrew To...
This paper examines how one can obtain state of the art Chinese word segmentation using global linear models. We provide experimental comparisons that give a detailed road-map for ...
— We present K2GA, an algorithm for learning Bayesian network structures from data. K2GA uses a genetic algorithm to perform stochastic search, while employing a modified versio...