Approximate Nearest Neighbor (ANN) methods such as Locality Sensitive Hashing, Semantic Hashing, and Spectral Hashing, provide computationally ecient procedures for nding objects...
Histograms are typically used to approximate data distributions. Histograms and related synopsis structures have been successful in a wide variety of popular database applications...
In the k-medoid problem, given a dataset P, we are asked to choose k points in P as the medoids. The optimal medoid set minimizes the average Euclidean distance between the points ...
: We develop and study the concept of dataflow process networks as used for example by Kahn to suit exact computation over data types related to real numbers, such as continuous fu...
Pseudo relevance feedback (PRF), which has been widely applied in IR, aims to derive a distribution from the top n pseudo relevant documents D. However, these documents are often ...