In this paper we address the problem of analyzing web log data collected at a typical online newspaper site. We propose a two-way clustering technique based on probability theory....
Hannes Wettig, Jussi Lahtinen, Tuomas Lepola, Petr...
In this paper we focus on the following problem in information management: given a large collection of recorded information and some knowledge of the process that is generating th...
Record deduplication is the task of merging database records that refer to the same underlying entity. In relational databases, accurate deduplication for records of one type is o...
Collecting large consistent data sets for real world software projects is problematic. Therefore, we explore how little data are required before the predictor performance plateaus...
Distributed sensor networks are highly prone to accidental errors and malicious activities, owing to their limited resources and tight interaction with the environment. Yet only a...
Claudio Basile, Meeta Gupta, Zbigniew Kalbarczyk, ...