We describe an approach for pipelining nested data collections in scientific workflows. Our approach logically delimits arbitrarily nested collections of data tokens using special...
This paper addresses the problem of scheduling concurrent jobs on clusters where application data is stored on the computing nodes. This setting, in which scheduling computations ...
Michael Isard, Vijayan Prabhakaran, Jon Currey, Ud...
Nowadays, structured data such as sales and business forms are stored in data warehouses for decision makers to use. Further, unstructured data such as emails, html texts, images,...
R is a numerical computing environment that is widely popular for statistical data analysis. Like many such environments, R performs poorly for large datasets whose sizes exceed t...
This paper reviews literature on scalable data structures for searching in a distributed computing environment. Starting with a system where one server manages a file of a given s...