In timed, zero-sum games, the goal is to maximize the probability of winning, which is not necessarily the same as maximizing our expected reward. We consider cumulative intermedi...
Agents can benefit from contracting some of their tasks that cannot be performedby themselves or that can be performed moreefficiently by other agents. Developing an agent's ...
An issue that is critical for the application of Markov decision processes MDPs to realistic problems is how the complexity of planning scales with the size of the MDP. In stochas...
Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playi...
We propose a distributed mechanism for finding websurfing strategies that is inspired by the StumbleUpon recommendation engine. Each day, a websurfer visits a sequence of websites ...