These days an increasing number of applications, especially in science and engineering, are dealing with a massive amount of data; hence they are dataintensive. Bioinformatics, da...
One major problem of existing methods to mine data streams is that it makes ad hoc choices to combine most recent data with some amount of old data to search the new hypothesis. T...
In this paper we introduce a modular, highly flexible, opensource environment for data generation. Using an existing graphical data flow tool, the user can combine various types...
Missing data handling is an important preparation step for most data discrimination or mining tasks. Inappropriate treatment of missing data may cause large errors or false result...
HadoopDB is a hybrid of MapReduce and DBMS technologies, designed to meet the growing demand of analyzing massive datasets on very large clusters of machines. Our previous work ha...