This paper introduces a new approach to actionvalue function approximation by learning basis functions from a spectral decomposition of the state-action manifold. This paper exten...
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...
Abstract. Current point-based planning algorithms for solving partially observable Markov decision processes (POMDPs) have demonstrated that a good approximation of the value funct...
A new spectral approach to value function approximation has recently been proposed to automatically construct basis functions from samples. Global basis functions called proto-val...
We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication be...