Sciweavers

5757 search results - page 251 / 1152
» Dynamic Policy Programming
Sort
View
ATAL
2005
Springer
16 years 3 days ago
Exploiting belief bounds: practical POMDPs for personal assistant agents
Agents or agent teams deployed to assist humans often face the challenges of monitoring the state of key processes in their environment (including the state of their human users t...
Pradeep Varakantham, Rajiv T. Maheswaran, Milind T...
COR
2008
116views more  COR 2008»
15 years 6 months ago
Supply disruptions with time-dependent parameters
We consider a firm that faces random demand and receives product from a single supplier who faces random supply. The supplier's availability may be affected by events such as...
Andrew M. Ross, Ying Rong, Lawrence V. Snyder
CORR
2006
Springer
113views Education» more  CORR 2006»
15 years 6 months ago
A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD
This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(), LSTD()...
Manuel Loth, Philippe Preux
CORR
2010
Springer
143views Education» more  CORR 2010»
15 years 3 months ago
The Non-Bayesian Restless Multi-Armed Bandit: a Case of Near-Logarithmic Regret
In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are N arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A play...
Wenhan Dai, Yi Gai, Bhaskar Krishnamachari, Qing Z...
DOCENG
2004
ACM
16 years 4 hour ago
Behavioral reactivity and real time programming in XML: functional programming meets SMIL animation
XML and its associated languages are emerging as powerful authoring tools for multimedia and hypermedia web content. Furthermore, intelligent presentation generation engines have ...
Peter R. King, Patrick Schmitz, Simon J. Thompson