scispace - formally typeset
Journal ArticleDOI

Learning from delayed rewards

Ben Kröse
- 01 Oct 1995 - 
- Vol. 15, Iss: 4, pp 233-235
TLDR
The invention relates to a circuit for use in a receiver which can receive two-tone/stereo signals which is intended to make a choice between mono or stereo reproduction of signal A or of signal B and vice versa.
About
This article is published in Robotics and Autonomous Systems.The article was published on 1995-10-01. It has received 2861 citations till now. The article focuses on the topics: Autonomous system (mathematics) & Robotics.

read more

Citations
More filters
Journal ArticleDOI

On partially controlled multi-agent systems

TL;DR: In this article, the authors distinguish between two types of agents within a multi-agent system: controllable agents which are directly controlled by the system's designer, and uncontrollable agents, which are not under the designer's direct control.
Journal ArticleDOI

Dynamic bipedal walking assisted by learning

TL;DR: A general control architecture for bipedal walking which is based on a divide-and-conquer approach is presented, and the sagittal-plane motion-control algorithm is formulated using a control approach known as Virtual Model Control.
Journal ArticleDOI

A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters

TL;DR: This work proves that a fuzzy rulebase actor satisfies the necessary conditions that guarantee the convergence of its parameters to a local optimum, and provides the first convergence proof for fuzzy reinforcement learning (FRL).
Proceedings ArticleDOI

Hierarchical learning of robot skills by reinforcement

TL;DR: It is shown how reinforcement learning can be made practical for complex problems by introducing hierarchical learning and artificial neural networks are used to generalize experiences.
Posted Content

Deep Reinforcement Fuzzing

TL;DR: This paper formalizes fuzzing as a reinforcement learning problem using the concept of Markov decision processes, which allows for state-of-the-art deep Q-learning algorithms that optimize rewards, which are defined from runtime properties of the program under test.
Related Papers (5)