We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E3 (Kearns and Singh, 2002) learn near-optimal policies by using "exploration policies...
The conditional distribution of a discrete variable y, given another discrete variable x, is often specified by assigning one multinomial distribution to each state of x. The cost...
We present a trainable sequential-inference technique for processes with large state and observation spaces and relational structure. Our method assumes "reliable observation...
Hierarchical reinforcement learning is a general framework which attempts to accelerate policy learning in large domains. On the other hand, policy gradient reinforcement learning...
Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many text-related tasks, such as part-of-speech t...
Andrew McCallum, Dayne Freitag, Fernando C. N. Per...