UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

doi:10.1007/S10998-010-3055-6

Open AccessJournal ArticleDOI

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Peter Auer, +1 more

- 01 Oct 2010 -

Periodica Mathematica Hungarica

- Vol. 61, Iss: 1, pp 55-65

Chats0

TLDR

For this modified UCB algorithm, an improved bound on the regret is given with respect to the optimal reward for K-armed bandits after T trials.

Abstract:

In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in K-armed bandits after T trials is bounded by const · $$ \frac{{K\log (T)}} {\Delta } $$ , where Δ measures the distance between a suboptimal arm and the optimal arm, for the modified UCB algorithm we show an upper bound on the regret of const · $$ \frac{{K\log (T\Delta ^2 )}} {\Delta } $$ .

Citations

PDF

Open Access

More filters

Book

Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems

Sébastien Bubeck, +1 more

TL;DR: In this article, the authors focus on regret analysis in the context of multi-armed bandit problems, where regret is defined as the balance between staying with the option that gave highest payoff in the past and exploring new options that might give higher payoffs in the future.

...read moreread less

Journal ArticleDOI

Combinatorial bandits

Nicolò Cesa-Bianchi, +1 more

- 01 Sep 2012 -

Journal of Computer and System Sciences

TL;DR: A variant of a strategy by Dani, Hayes and Kakade achieving a regret bound that, for a variety of concrete choices of S, is of order ndln|S| where n is the time horizon is introduced.

...read moreread less

Proceedings Article

Almost Optimal Exploration in Multi-Armed Bandits

Zohar Karnin, +2 more

TL;DR: Two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given atarget budget of arm pulls, are presented, for which upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters are proved.

...read moreread less

Book

Principles Of Cognitive Radio

Ezio Biglieri, +4 more

TL;DR: 1. The concept of cognitive radio, capacity of cognitiveRadio networks, and Propagation issues for cognitive radio: a review.

...read moreread less

Proceedings Article

The Best of Both Worlds: Stochastic and Adversarial Bandits

Sébastien Bubeck, +1 more

TL;DR: In this paper, the authors present a bandit algorithm, SAO (Stochastic and Adversarial Optimal), whose regret is (essentially) optimal both for adversarial rewards and for stochastic rewards.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Book ChapterDOI

Probability Inequalities for sums of Bounded Random Variables

Wassily Hoeffding

- 01 Mar 1963 -

Journal of the American Statistical Asso...

TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.

...read moreread less

Journal ArticleDOI

Finite-time Analysis of the Multiarmed Bandit Problem

Peter Auer, +2 more

- 01 May 2002 -

Machine Learning

TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

...read moreread less