It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...
Abstract--We propose a new Consistent Weighted Sampling method, where the probability of drawing identical samples for a pair of inputs is equal to their Jaccard similarity. Our me...
The ability to aggregate huge volumes of queries over a large population of users allows search engines to build precise models for a variety of query-assistance features such as ...
In this paper, we devise a method for the estimation of the true support of itemsets on data streams, with the objective to maximize one chosen criterion among {precision, recall}...
Pierre-Alain Laur, Richard Nock, Jean-Emile Sympho...
Data retrieval and its integration is one of the major problems that face large and complex health organizations. This is especially relevant when patient information is produced i...