scispace - formally typeset
Open AccessJournal ArticleDOI

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Peter Auer, +1 more
- 01 Oct 2010 - 
- Vol. 61, Iss: 1, pp 55-65
Reads0
Chats0
TLDR
For this modified UCB algorithm, an improved bound on the regret is given with respect to the optimal reward for K-armed bandits after T trials.
Abstract
In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in K-armed bandits after T trials is bounded by const · $$ \frac{{K\log (T)}} {\Delta } $$ , where Δ measures the distance between a suboptimal arm and the optimal arm, for the modified UCB algorithm we show an upper bound on the regret of const · $$ \frac{{K\log (T\Delta ^2 )}} {\Delta } $$ .

read more

Citations
More filters
Book

Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems

TL;DR: In this article, the authors focus on regret analysis in the context of multi-armed bandit problems, where regret is defined as the balance between staying with the option that gave highest payoff in the past and exploring new options that might give higher payoffs in the future.
Journal ArticleDOI

Combinatorial bandits

TL;DR: A variant of a strategy by Dani, Hayes and Kakade achieving a regret bound that, for a variety of concrete choices of S, is of order ndln|S| where n is the time horizon is introduced.
Proceedings Article

Almost Optimal Exploration in Multi-Armed Bandits

TL;DR: Two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given atarget budget of arm pulls, are presented, for which upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters are proved.
Book

Principles Of Cognitive Radio

TL;DR: 1. The concept of cognitive radio, capacity of cognitiveRadio networks, and Propagation issues for cognitive radio: a review.
Proceedings Article

The Best of Both Worlds: Stochastic and Adversarial Bandits

TL;DR: In this paper, the authors present a bandit algorithm, SAO (Stochastic and Adversarial Optimal), whose regret is (essentially) optimal both for adversarial rewards and for stochastic rewards.
References
More filters
Book

Reinforcement Learning: An Introduction

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Book ChapterDOI

Probability Inequalities for sums of Bounded Random Variables

TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Journal ArticleDOI

Finite-time Analysis of the Multiarmed Bandit Problem

TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Journal ArticleDOI

The Nonstochastic Multiarmed Bandit Problem

TL;DR: A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs.
Related Papers (5)