Solving two‐armed Bernoulli bandit problems using a Bayesian learning automaton

doi:10.1108/17563781011049179

Journal ArticleDOI

Solving two‐armed Bernoulli bandit problems using a Bayesian learning automaton

Ole-Christoffer Granmo

- 08 Jun 2010 -

International Journal of Intelligent Com...

- Vol. 3, Iss: 2, pp 207-234

TLDR

Research is reported into a completely new family of solution schemes for the TABB problem: the Bayesian learning automaton (BLA) family, based upon merely counting rewards/penalties, combined with random sampling from a pair of twin Beta distributions.

Abstract:

Purpose – The two‐armed Bernoulli bandit (TABB) problem is a classical optimization problem where an agent sequentially pulls one of two arms attached to a gambling machine, with each pull resulting either in a reward or a penalty. The reward probabilities of each arm are unknown, and thus one must balance between exploiting existing knowledge about the arms, and obtaining new information. The purpose of this paper is to report research into a completely new family of solution schemes for the TABB problem: the Bayesian learning automaton (BLA) family.Design/methodology/approach – Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. BLA avoids the problem of computational intractability by not explicitly performing the Bayesian computations. Rather, it is based upon merely counting rewards/penalties, combined with random sampling from a pair of twin Beta distributions. This is intuitively appealing since the Bayesian conjugate prior for a bino...

Solving two‐armed Bernoulli bandit problems using a Bayesian learning automaton

Citations

An Empirical Evaluation of Thompson Sampling

Analysis of Thompson Sampling for the Multi-armed Bandit Problem

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson sampling: an asymptotically optimal finite-time analysis

Combinatorial bandits

References

Reinforcement Learning: An Introduction

Pattern Classification

Finite-time Analysis of the Multiarmed Bandit Problem

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples

Bandit based monte-carlo planning

Related Papers (5)

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples

Finite-time Analysis of the Multiarmed Bandit Problem

An Empirical Evaluation of Thompson Sampling

Analysis of Thompson Sampling for the Multi-armed Bandit Problem

Asymptotically efficient adaptive allocation rules