Search Sciweavers | Sciweavers

3381 search results - page 368 / 677

» LEO - DB2's LEarning Optimizer

157

click to vote

ML
2002
ACM

133views Machine Learning» more ML 2002»

Finite-time Analysis of the Multiarmed Bandit Problem

15 years 6 months ago

Download homes.dsi.unimi.it

Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while t...

Peter Auer, Nicolò Cesa-Bianchi, Paul Fisch...

claim paper

Read More »

170

click to vote

ICML
2005
IEEE

103views Machine Learning» more ICML 2005»

A support vector method for multivariate performance measures

16 years 7 months ago

Download www.cs.cornell.edu

This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algo...

Thorsten Joachims

claim paper

Read More »

163

click to vote

ICML
2001
IEEE

173views Machine Learning» more ICML 2001»

Some Theoretical Aspects of Boosting in the Presence of Noisy Data

16 years 7 months ago

Download newton.stats.northwestern.edu

This is a survey of some theoretical results on boosting obtained from an analogous treatment of some regression and classi cation boosting algorithms. Some related papers include...

Wenxin Jiang

claim paper

Read More »

156

click to vote

ALT
2008
Springer

141views Machine Learning» more ALT 2008»

Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

16 years 3 months ago

Download personal.unileoben.ac.at

Abstract. We consider an upper conﬁdence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the onl...

Ronald Ortner

claim paper

Read More »

169

click to vote

ICRA
2009
IEEE

143views Robotics» more ICRA 2009»

Least absolute policy iteration for robust value function approximation

16 years 1 months ago

Download sugiyama-www.cs.titech.ac.jp

Abstract— Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efﬁciency. However, it tends to be sensitive to outliers...

Masashi Sugiyama, Hirotaka Hachiya, Hisashi Kashim...

claim paper

Read More »

« Prev « First page 368 / 677 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers