scispace - formally typeset
P

Peter Dayan

Researcher at Max Planck Society

Publications -  495
Citations -  75361

Peter Dayan is an academic researcher from Max Planck Society. The author has contributed to research in topics: Reinforcement learning & Computer science. The author has an hindex of 100, co-authored 460 publications receiving 65492 citations. Previous affiliations of Peter Dayan include University of California, Los Angeles & Wellcome Trust Centre for Neuroimaging.

Papers
More filters
Journal ArticleDOI

Technical Note : \cal Q -Learning

TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
Journal ArticleDOI

A Neural Substrate of Prediction and Reward

TL;DR: Findings in this work indicate that dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events can be understood through quantitative theories of adaptive optimizing control.
Book

Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems

Peter Dayan, +1 more
TL;DR: This text introduces the basic mathematical and computational methods of theoretical neuroscience and presents applications in a variety of areas including vision, sensory-motor integration, development, learning, and memory.
Journal ArticleDOI

Technical Note Q-Learning

TL;DR: In this article, it is shown that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action values are represented discretely.
Journal ArticleDOI

Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control

TL;DR: This work considers dual-action choice systems from a normative perspective, and suggests a Bayesian principle of arbitration between them according to uncertainty, so each controller is deployed when it should be most accurate.