In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...
Essentially all data mining algorithms assume that the datagenerating process is independent of the data miner's activities. However, in many domains, including spam detectio...
Nilesh N. Dalvi, Pedro Domingos, Mausam, Sumit K. ...
Dimension reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-dimensional data. In the ...
Jieping Ye, Qi Li, Hui Xiong, Haesun Park, Ravi Ja...
We present an empirical study of teams that revealed the amount of extraneous individual work needed to enable collaboration: finding references to other people, finding files to ...
John C. Tang, James Lin, Jeffrey Pierce, Steve Whi...
Advances in location-enhanced technology are making it easier for us to be located by others. These new technologies present a difficult privacy tradeoff, as disclosing one's...
Sunny Consolvo, Ian E. Smith, Tara Matthews, Antho...