The connection between Neuroscience & Reinforcement Learning

Original Source Here

The connection between Neuroscience & Reinforcement Learning

This article is focused on establishing the link between How dopamine: chemicals deeply involved in reward processing in the brain of mammals follows Temporal-difference(TD) errors.

1. Basic of Neuroscience, Dopamine and TD error


Neurones, the main components of nervous systems, are cells specialized for processing and transmitting information using electrical and chemical signals.

A neuron’s output consists of sequences of electrical pulses called action potentials that travel along the axon. Action potentials are also called spikes, and a neuron is said to fire. In ANN this process can link to applying activation function which causes the activation in neurons.


Dopamine is produced as a neurotransmitter by neurons whose cell bodies lie mainly in two clusters of neurons in the midbrain of mammals: substantial nigra pars compact (SNpc) and the ventral tegmental area (VTA).

Dopamine plays an essential role in many processes of the brain such as motivation, learning, action selection, the most form of addiction and Parkinson’s disease.

TD error

If one had to identify one idea as central and novel to reinforcement learning, it would undoubtedly be temporal-difference(TD) learning. TD learning is a combination of Monte Carlo ideas and dynamic programming(DP) ideas.

Like Monte Carlo methods, TD methods can learn directly from raw experience without a model of the environment’s dynamic and like DP, TD methods update estimates based in part on other learned estimates, without waiting for a final outcome.

Equation 6.3 is used in Monte Carlo methods, whereas equation 6.4 is used in DP methods to get an estimate of the target.

Finally, note that the update in the TD is a sort of error, measuring the difference between the estimated value of state at time t and the better estimate R(t+1) +gamma*V(St+1). This quantity, called TD error.

TD error

2. Experimental Support

Romo and Schultz (1990) and Schultz and Romo(1990) took the first steps toward the reward prediction error hypothesis by recording the activity of dopamine neurons and muscles activity while monkeys moved their arms.

They trained two monkeys to reach from a resting hand position into a bin containing a bit of apple when the monkey saw and heard the bin’s door open. The monkey could then grab and bring the food to its mouth. After a monkey become good at this, it was trained on two additional tasks.

The purpose of the first task was to see what dopamine neurons do when movements are self-initiated. The bin was left open but covered from above so that the monkey could not see inside but could reach it from below.

Here they found that dopamine neurons were not related to monkey’s movement, but a large percentage of these neurons produced phasic responses whenever the monkey first touched a food morsel. This is good evidence that the neurons were responding to the food and not to other aspects of the task.

The purpose of the second task was to see what happen when movements are triggered by stimuli. This task used a different bin with a movable cover. The sight and sound of the bin opening triggered movements to the bin. Results of this task show that dopamine neurons were not responding to food instead responded to the sight and sound of the opening cover of the food bin.

This observation shows that dopamine neurons were not responding to the initial movement nor to the sensory properties of the stimuli, but were rather signalling an expectation of reward.

So, dopamine neurons’ response at a time t corresponds to TD error, not to Rt (Reword).


The neural pathways involved in the brain’s reward system are complex and incompletely understood, but neuroscience research directed toward understanding these pathways and their roles in behaviour is progressing rapidly. This search revealing striking correspondences between the brain reward system and the theory of reinforcement learning. The key point to notice from this article is to dopamine neurons are reinforcement signals not reward signals.


Reinforcement Learning: An Introduction

Book by Andrew Barto and Richard S. Sutton


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: