Abstract. Automated online search is a powerful technique for performance diagnosis. Such a search can change the types of experiments it performs while the program is running, mak...
Distance metric is widely used in similarity estimation. In this paper we find that the most popular Euclidean and Manhattan distance may not be suitable for all data distribution...
This paper addresses the training of classification trees for weakly labelled data. We call ”weakly labelled data”, a training set such as the prior labelling information pro...
This paper proposes a cache hierarchy that enables Web search engines to efficiently process user queries. The different caches in the hierarchy are used to store pieces of data w...
: Many companies have recognized the strategic importance of the knowledge hidden in their large databases and have built data warehouses. Typically, updates are collected and appl...