Abstract— Q-learning is a technique used to compute an optimal policy for a controlled Markov chain based on observations of the system controlled using a non-optimal policy. It ...
Decentralized partially observable Markov decision processes (DEC-POMDPs) form a general framework for planning for groups of cooperating agents that inhabit a stochastic and part...
Matthijs T. J. Spaan, Geoffrey J. Gordon, Nikos A....
In this work we present a method for the estimation of a rank-one pattern living in two heterogeneous spaces, when observed through a mixture in multiple observation sets. Using a ...
Ronald Phlypo, Nisrine Jrad, Bertrand Rivet, Marco...
As organizations begin to deploy large computational grids, it has become apparent that systems for observation and control of the resources, services, and applications that make ...
This paper derives a near optimal distributed Kalman filter to estimate a large-scale random field monitored by a network of N sensors. The field is described by a sparsely con...