We collected file system content data from 857 desktop computers at Microsoft over a span of 4 weeks. We analyzed the data to determine the relative efficacy of data deduplication...
Many substantial geographic information systems (GIS) have been designed for use by expert users. As a result, nonexpert users often find them difficult to use. This paper present...
Random Indexing is a vector space technique that provides an efficient and scalable approximation to distributional similarity problems. We present experiments showing Random Inde...
Several methods for automatically generating labeled examples that can be used as training data for WSD systems have been proposed, including a semisupervised approach based on re...
Data mining is currently becoming an increasingly hot research field, but a large gap still remains between the research of data mining and its application in real-world business....