UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem
Peter Auer,Ronald Ortner +1 more
Reads0
Chats0
TLDR
For this modified UCB algorithm, an improved bound on the regret is given with respect to the optimal reward for K-armed bandits after T trials.Abstract:
In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in K-armed bandits after T trials is bounded by const · $$
\frac{{K\log (T)}}
{\Delta }
$$
, where Δ measures the distance between a suboptimal arm and the optimal arm, for the modified UCB algorithm we show an upper bound on the regret of const · $$
\frac{{K\log (T\Delta ^2 )}}
{\Delta }
$$
.read more
Citations
More filters
Book
Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems
TL;DR: In this article, the authors focus on regret analysis in the context of multi-armed bandit problems, where regret is defined as the balance between staying with the option that gave highest payoff in the past and exploring new options that might give higher payoffs in the future.
Journal ArticleDOI
Combinatorial bandits
Nicolò Cesa-Bianchi,Gábor Lugosi +1 more
TL;DR: A variant of a strategy by Dani, Hayes and Kakade achieving a regret bound that, for a variety of concrete choices of S, is of order ndln|S| where n is the time horizon is introduced.
Proceedings Article
Almost Optimal Exploration in Multi-Armed Bandits
TL;DR: Two novel, parameter-free algorithms for identifying the best arm, in two different settings: given a target confidence and given atarget budget of arm pulls, are presented, for which upper bounds whose gap from the lower bound is only doubly-logarithmic in the problem parameters are proved.
Book
Principles Of Cognitive Radio
TL;DR: 1. The concept of cognitive radio, capacity of cognitiveRadio networks, and Propagation issues for cognitive radio: a review.
Proceedings Article
The Best of Both Worlds: Stochastic and Adversarial Bandits
TL;DR: In this paper, the authors present a bandit algorithm, SAO (Stochastic and Adversarial Optimal), whose regret is (essentially) optimal both for adversarial rewards and for stochastic rewards.
References
More filters
Book
Reinforcement Learning: An Introduction
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Book ChapterDOI
Probability Inequalities for sums of Bounded Random Variables
TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Journal ArticleDOI
Finite-time Analysis of the Multiarmed Bandit Problem
TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Journal ArticleDOI
The Nonstochastic Multiarmed Bandit Problem
TL;DR: A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs.
Journal ArticleDOI
Asymptotically efficient adaptive allocation rules
Tze Leung Lai,Herbert Robbins +1 more