We present an algorithm for on-line, incremental discovery of temporal-difference (TD) networks. The key contribution is the establishment of three criteria to expand a node in TD...
Abstract. We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onl...
This paper treats tracking as a foreground/background classification problem and proposes an online semisupervised learning framework. Initialized with a small number of labeled ...
Internet routing is mostly based on static information-it's dynamicity is limited to reacting to changes in topology. Adaptive performance-based routing decisions would not o...
Ioannis C. Avramopoulos, Jennifer Rexford, Robert ...
Recent interest in the use of software character agents raises the issue of how many agents should be used in online learning. In this paper we review evidence concerning the rela...