Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Given a user-specified minimum correlation threshold and a market basket database with N items and T transactions, an all-strong-pairs correlation query finds all item pairs with...
Skewed distributions appear very often in practice. Unfortunately, the traditional Zipf distribution often fails to model them well. In this paper, we propose a new probability di...
Background: Support Vector Machines (SVMs) ? using a variety of string kernels ? have been successfully applied to biological sequence classification problems. While SVMs achieve ...
In situ staining of a target mRNA at several time points during the development of a D. melanogaster embryo gives one a detailed spatio-temporal view of the expression pattern of ...