Apprenticeship learning via inverse reinforcement learning

doi:10.1145/1015330.1015430

Proceedings ArticleDOI

Apprenticeship learning via inverse reinforcement learning

- pp 1-8

TLDR

This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.

Abstract:

We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. Our algorithm is based on using "inverse reinforcement learning" to try to recover the unknown reward function. We show that our algorithm terminates in a small number of iterations, and that even though we may never recover the expert's reward function, the policy output by the algorithm will attain performance close to that of the expert, where here performance is measured with respect to the expert's unknown reward function.

Citations

PDF

Open Access

More filters

Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Convex Analysisの二,三の進展について

徹丸山

Journal ArticleDOI

A survey of robot learning from demonstration

Brenna D. Argall, +3 more

- 01 May 2009 -

Robotics and Autonomous Systems

TL;DR: A comprehensive survey of robot Learning from Demonstration (LfD), a technique that develops policies from example state to action mappings, which analyzes and categorizes the multiple ways in which examples are gathered, as well as the various techniques for policy derivation.

...read moreread less

Proceedings Article

Maximum entropy inverse reinforcement learning

Brian D. Ziebart, +3 more

TL;DR: A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.

...read moreread less

Journal ArticleDOI

Reinforcement learning in robotics: A survey

Jens Kober, +2 more

- 01 Sep 2013 -

The International Journal of Robotics Re...

TL;DR: This article attempts to strengthen the links between the two research communities by providing a survey of work in reinforcement learning for behavior generation in robots by highlighting both key challenges in robot reinforcement learning as well as notable successes.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

The Nature of Statistical Learning Theory

Vladimir Vapnik

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Statistical learning theory

Vladimir Vapnik

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

Convex Analysisの二,三の進展について

徹丸山

Journal ArticleDOI

Algorithms for Inverse Reinforcement Learning

Andrew Y. Ng, +1 more

TL;DR: Pharmacokinetics of ivermectin after IV administration were best described by a 2-compartment open model; values for main compartmental variables included volume of distribution at a steady state, area under the plasma concentration-time curve, and area underThe AUC curve.

...read moreread less

Proceedings Article

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

Andrew Y. Ng, +2 more

TL;DR: Conditions under which modi cations to the reward function of a Markov decision process preserve the op timal policy are investigated to shed light on the practice of reward shap ing a method used in reinforcement learn ing whereby additional training rewards are used to guide the learning agent.

...read moreread less

Robotics and Autonomous Systems

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

Apprenticeship learning via inverse reinforcement learning

Citations

Reinforcement Learning: An Introduction

Convex Analysisの二,三の進展について

A survey of robot learning from demonstration

Maximum entropy inverse reinforcement learning

Reinforcement learning in robotics: A survey

References

The Nature of Statistical Learning Theory

Statistical learning theory

Convex Analysisの二,三の進展について

Algorithms for Inverse Reinforcement Learning

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

Related Papers (5)

Algorithms for Inverse Reinforcement Learning

Maximum entropy inverse reinforcement learning

A survey of robot learning from demonstration

Reinforcement Learning: An Introduction

Human-level control through deep reinforcement learning