Collaborative filtering (CF) is valuable in e-commerce, and for direct recommendations for music, movies, news etc. But today's systems have several disadvantages, including ...
We present a query-driven algorithm for the distributed indexing of large document collections within structured P2P networks. To cope with bandwidth consumption that has been ide...
Gleb Skobeltsyn, Toan Luu, Ivana Podnar Zarko, Mar...
Dimension attributes in data warehouses are typically hierarchical (e.g., geographic locations in sales data, URLs in Web traffic logs). OLAP tools are used to summarize the measu...
Essentially all data mining algorithms assume that the datagenerating process is independent of the data miner's activities. However, in many domains, including spam detectio...
Nilesh N. Dalvi, Pedro Domingos, Mausam, Sumit K. ...
Modern storage systems use thousands of inexpensive disks to meet the storage requirement of applications. To enhance the data availability, some form of redundancy is used. For e...
Liping Xiang, Yinlong Xu, John C. S. Lui, Qian Cha...