The problem of ethnicity identification from names has a variety of important applications, including biomedical research, demographic studies, and marketing. Here we report on th...
Anurag Ambekar, Charles B. Ward, Jahangir Mohammed...
We use clustering to derive new relations which augment database schema used in automatic generation of predictive features in statistical relational learning. Clustering improves...
Physical database design is important for query performance in a shared-nothing parallel database system, in which data is horizontally partitioned among multiple independent node...
Jun Rao, Chun Zhang, Nimrod Megiddo, Guy M. Lohman
For complex data mining queries, query optimization issues arise, similar to those for the traditional database queries. However, few works have applied the cost-based query optim...
The prevailing model for digital preservation is that archives should be similar to a “fortress”: a large, protective infrastructure built to defend a relatively small collect...