Journal ArticleDOI
Learning from delayed rewards
TLDR
The invention relates to a circuit for use in a receiver which can receive two-tone/stereo signals which is intended to make a choice between mono or stereo reproduction of signal A or of signal B and vice versa.About:
This article is published in Robotics and Autonomous Systems.The article was published on 1995-10-01. It has received 2861 citations till now. The article focuses on the topics: Autonomous system (mathematics) & Robotics.read more
Citations
More filters
Journal ArticleDOI
On partially controlled multi-agent systems
TL;DR: In this article, the authors distinguish between two types of agents within a multi-agent system: controllable agents which are directly controlled by the system's designer, and uncontrollable agents, which are not under the designer's direct control.
Journal ArticleDOI
Dynamic bipedal walking assisted by learning
Chee-Meng Chew,Gill A. Pratt +1 more
TL;DR: A general control architecture for bipedal walking which is based on a divide-and-conquer approach is presented, and the sagittal-plane motion-control algorithm is formulated using a control approach known as Virtual Model Control.
Journal ArticleDOI
A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters
H.R. Berenji,D. Vengerov +1 more
TL;DR: This work proves that a fuzzy rulebase actor satisfies the necessary conditions that guarantee the convergence of its parameters to a local optimum, and provides the first convergence proof for fuzzy reinforcement learning (FRL).
Proceedings ArticleDOI
Hierarchical learning of robot skills by reinforcement
TL;DR: It is shown how reinforcement learning can be made practical for complex problems by introducing hierarchical learning and artificial neural networks are used to generalize experiences.
Posted Content
Deep Reinforcement Fuzzing
TL;DR: This paper formalizes fuzzing as a reinforcement learning problem using the concept of Markov decision processes, which allows for state-of-the-art deep Q-learning algorithms that optimize rewards, which are defined from runtime properties of the program under test.
Related Papers (5)
Human-level control through deep reinforcement learning
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more