Wedescribea novel approachfor clustering collectionsof sets,andits applicationto theanalysis and mining of categoricaldata. By "categorical data," we meantableswith fiel...
David Gibson, Jon M. Kleinberg, Prabhakar Raghavan
We use a combination of proven methods from time series analysis and machine learning to explore the relationship between temporal and semantic similarity in web query logs; we di...
Bing Liu 0003, Rosie Jones, Kristina Lisa Klinkner
Clustering algorithms such as k-means, the self-organizing map (SOM), or Neural Gas (NG) constitute popular tools for automated information analysis. Since data sets are becoming l...
Abstract—The TNM (Tumor, Lymph Node, Metastasis) is a widely used staging system for predicting the outcome of cancer patients. However, the TNM is not accurate in prediction, pa...
In previous work on "transformed mixtures of Gaussians" and "transformed hidden Markov models", we showed how the EM algorithm in a discrete latent variable mo...