Journal ArticleDOI
Learning from delayed rewards
TLDR
The invention relates to a circuit for use in a receiver which can receive two-tone/stereo signals which is intended to make a choice between mono or stereo reproduction of signal A or of signal B and vice versa.About:
This article is published in Robotics and Autonomous Systems.The article was published on 1995-10-01. It has received 2861 citations till now. The article focuses on the topics: Autonomous system (mathematics) & Robotics.read more
Citations
More filters
Journal ArticleDOI
Deep learning in neural networks
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
Book
Genetic Programming: On the Programming of Computers by Means of Natural Selection
TL;DR: This book discusses the evolution of architecture, primitive functions, terminals, sufficiency, and closure, and the role of representation and the lens effect in genetic programming.
Book
Dynamic Programming and Optimal Control
TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.
Journal ArticleDOI
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
TL;DR: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.
Journal ArticleDOI
Ant colony system: a cooperative learning approach to the traveling salesman problem
TL;DR: The results show that the ACS outperforms other nature-inspired algorithms such as simulated annealing and evolutionary computation, and it is concluded comparing ACS-3-opt, a version of the ACS augmented with a local search procedure, to some of the best performing algorithms for symmetric and asymmetric TSPs.
Related Papers (5)
Human-level control through deep reinforcement learning
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more