Data--call records, internet packet headers, or other transaction records--are coming down a pipe at a ferocious rate, and we need to monitor statistics of the data. There is no r...
MapReduce and stream processing are two emerging, but different, paradigms for analyzing, processing and making sense of large volumes of modern day data. While MapReduce offers t...
In plenty of scenarios, data can be represented as vectors mathematically abstracted as points in a Euclidean space. Because a great number of machine learning and data mining app...
Classifying and mining noise-free web pages will improve on accuracy of search results as well as search speed, and may benefit webpage organization applications (e.g., keyword-bas...
Information diffusion and virus propagation are fundamental processes talking place in networks. While it is often possible to directly observe when nodes become infected, observi...
Manuel Gomez-Rodriguez, Jure Leskovec, Andreas Kra...