For a given problem, the optimal Markov policy over a finite horizon is a conditional plan containing a potentially large number of branches. However, there are applications wher...
Traditionally, optimizers are “programmed” to optimize queries following a set of buildin procedures. However, optimizers should be robust to its changing environment to gener...
We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we wa...
The problem of radio channel assignments with multiple levels of interference can be modeled using graph theory. Given a graph G, possibly infinite, and real numbers k1, k2, . . ...
We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E3 (Kearns and Singh, 2002) learn near-optimal policies by using "exploration policies...