A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
It is becoming increasingly common to construct databases from information automatically culled from many heterogeneous sources. For example, a research publication database can b...
Aron Culotta, Michael L. Wick, Robert Hall, Matthe...
In this paper, we introduce a system named Argo which provides intelligent advertising made possible from users’ photo collections. Based on the intuition that user-generated ph...
Xin-Jing Wang, Mo Yu, Lei Zhang, Rui Cai, Wei-Ying...
At Google, experimentation is practically a mantra; we evaluate almost every change that potentially affects what our users experience. Such changes include not only obvious user-...
Diane Tang, Ashish Agarwal, Deirdre O'Brien, Mike ...
: In the paper nine different approaches to missing attribute values are presented and compared. Ten input data files were used to investigate the performance of the nine methods t...