The output of a data mining algorithm is only as good as its inputs, and individuals are often unwilling to provide accurate data about sensitive topics such as medical history an...
In this paper, we propose a fast, memory-efficient, and scalable clustering algorithm for analyzing transactional data. Our approach has three unique features. First, we use the c...
This paper presents a new incremental learning solution for Linear Discriminant Analysis (LDA). We apply the concept of the sufficient spanning set approximation in each update st...
Text analysis tools are nowadays required to process increasingly large corpora which are often organized as small files (abstracts, news articles, etc). Cloud computing offers a ...
Time series data is common in many settings including scientific and financial applications. In these applications, the amount of data is often very large. We seek to support pred...