We present a reinforcement learning architecture, Dyna-2, that encompasses both samplebased learning and sample-based search, and that generalises across states during both learni...
Automatic tools for finding software errors require a set of specifications before they can check code: if they do not know what to check, they cannot find bugs. This paper presen...
Ted Kremenek, Paul Twohey, Godmar Back, Andrew Y. ...
The majority of the work in the area of Markov decision processes has focused on expected values of rewards in the objective function and expected costs in the constraints. Althou...
For a while it seemed possible to pretend that all interaction between an algorithm and its environment occurs inter-step, but not anymore. Andreas Blass, Benjamin Rossman and the ...
Adaptive security, while more realistic as an adversarial model, is typically much harder to achieve compared to static security in cryptographic protocol design. Universal composi...