Sciweavers

82 search results - page 11 / 17
» Balancing Exploration and Exploitation in Learning to Rank O...
Sort
View
ML
2002
ACM
133views Machine Learning» more  ML 2002»
15 years 5 months ago
Finite-time Analysis of the Multiarmed Bandit Problem
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while t...
Peter Auer, Nicolò Cesa-Bianchi, Paul Fisch...
KDD
2007
ACM
178views Data Mining» more  KDD 2007»
16 years 6 months ago
Practical learning from one-sided feedback
In many data mining applications, online labeling feedback is only available for examples which were predicted to belong to the positive class. Such applications include spam filt...
D. Sculley
ATAL
2010
Springer
15 years 7 months ago
Learning context conditions for BDI plan selection
An important drawback to the popular Belief, Desire, and Intentions (BDI) paradigm is that such systems include no element of learning from experience. In particular, the so-calle...
Dhirendra Singh, Sebastian Sardiña, Lin Pad...
RAS
2010
117views more  RAS 2010»
15 years 4 months ago
Extending BDI plan selection to incorporate learning from experience
An important drawback to the popular Belief, Desire, and Intentions (BDI) paradigm is that such systems include no element of learning from experience. We describe a novel BDI exe...
Dhirendra Singh, Sebastian Sardiña, Lin Pad...
CORR
2004
Springer
103views Education» more  CORR 2004»
15 years 5 months ago
Online convex optimization in the bandit setting: gradient descent without a gradient
We study a general online convex optimization problem. We have a convex set S and an unknown sequence of cost functions c1, c2, . . . , and in each period, we choose a feasible po...
Abraham Flaxman, Adam Tauman Kalai, H. Brendan McM...