Sciweavers

3381 search results - page 368 / 677
» LEO - DB2's LEarning Optimizer
Sort
View
ML
2002
ACM
133views Machine Learning» more  ML 2002»
15 years 6 months ago
Finite-time Analysis of the Multiarmed Bandit Problem
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while t...
Peter Auer, Nicolò Cesa-Bianchi, Paul Fisch...
ICML
2005
IEEE
16 years 7 months ago
A support vector method for multivariate performance measures
This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algo...
Thorsten Joachims
ICML
2001
IEEE
16 years 7 months ago
Some Theoretical Aspects of Boosting in the Presence of Noisy Data
This is a survey of some theoretical results on boosting obtained from an analogous treatment of some regression and classi cation boosting algorithms. Some related papers include...
Wenxin Jiang
ALT
2008
Springer
16 years 3 months ago
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
Abstract. We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onl...
Ronald Ortner
ICRA
2009
IEEE
143views Robotics» more  ICRA 2009»
16 years 1 months ago
Least absolute policy iteration for robust value function approximation
Abstract— Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers...
Masashi Sugiyama, Hirotaka Hachiya, Hisashi Kashim...