Motivated by the problem of customer wallet estimation, we propose a new setting for multi-view regression, where we learn a completely unobserved target (in our case, customer wa...
Contextual text mining is concerned with extracting topical themes from a text collection with context information (e.g., time and location) and comparing/analyzing the variations...
As a fundamental data mining task, frequent pattern mining has widespread applications in many different domains. Research in frequent pattern mining has so far mostly focused on ...
Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, Che...
Motivated by numerous applications in which the data may be modeled by a variable subscripted by three or more indices, we develop a tensor-based extension of the matrix CUR decom...
Michael W. Mahoney, Mauro Maggioni, Petros Drineas
Various data mining applications involve data objects of multiple types that are related to each other, which can be naturally formulated as a k-partite graph. However, the resear...
Bo Long, Xiaoyun Wu, Zhongfei (Mark) Zhang, Philip...
Ontologies represent data relationships as hierarchies of possibly overlapping classes. Ontologies are closely related to clustering hierarchies, and in this article we explore th...
Jinze Liu, Qi Zhang, Wei Wang 0010, Leonard McMill...
Along with the blossom of open source projects comes the convenience for software plagiarism. A company, if less self-disciplined, may be tempted to plagiarize some open source pr...
Chao Liu 0001, Chen Chen, Jiawei Han, Philip S. Yu
While most software defects (i.e., bugs) are corrected and tested as part of the lengthy software development cycle, enterprise software vendors often have to release software pro...
Charles X. Ling, Victor S. Sheng, Tilmann F. W. Br...
There has been considerable interest in random projections, an approximate algorithm for estimating distances between pairs of points in a high-dimensional vector space. Let A Rn...
Given a huge real graph, how can we derive a representative sample? There are many known algorithms to compute interesting measures (shortest paths, centrality, betweenness, etc.)...