Operant Conditioning AO1 AO2 AO3

SKINNER (1955)
OPERANT CONDITIONING EXPLAINS LEARNING BY CONSEQUENCES

This theory was developed by B. F. Skinner, an American scientist. It is sometimes called “Skinnerian” Conditioning after him. Skinner carried out his original research on rats but the conclusions were applied to humans by American behaviourist psychologists. Skinner carried out hundreds of experiments and published widely; a 2002 survey showed him to be the most influential psychologist of the 20th century.

This theory is significant for students in other ways:

It shows how scientific research proceeds. Skinner’s discoveries about animal behaviour were generalised to humans based on evolutionary theory (that humans and other animals learn through similar mechanisms). This in turn led to the behaviourist school in Psychology.
It illustrates features of Learning Theory, since it studies behaviour as a response to external stimuli without taking into account cognitions
It ties in to your Key Question in Learning Theory, since it helps explain anorexia
It is important for you to understand how Social Learning Theory developed out of this theory

LEARNING BY CONSEQUENCES

Not all of our behaviour is involuntary, like “knee-jerk reactions”; some of it is voluntary, we know exactly what we’re doing. Voluntary behaviour can be learned too, because we notice the consequences of our actions and this affects how we behave next time we are put in the same situation.

Operant Conditioning tells us that behaviour is based on A-B-C, so if you want to change behaviour, you must change the antecedents (what has already happened) or the consequences; it's much easier to change the consequences.

Reinforcement is when the desired behaviour is rewarded. This makes it more likely to be repeated.

Positive reinforcement rewards the desired behaviour by adding something pleasant – food, affection, a compliment, money.

Negative reinforcement rewards the desired behaviour by removing something unpleasant – taking away pain or distress, stopping criticism, cancelling a fine.

There’s also primary reinforcement, which is when the reward is something we want naturally – a basic need such as food, warmth or affection. Secondary reinforcement is a reward we have learned to value – like money.

Punishment is when undesirable behaviour produces unpleasant consequences. Again, there is positive punishment, which punishes the undesirable behaviour by adding something unpleasant (a shock, a criticism, copying out lines), and negative punishment, which punishes by removing something pleasant (being 'grounded', deducting money, removing the Xbox).

Often, punishment combines both types: a detention involves adding something unpleasant (work) and taking away something pleasant (your break time).

Skinner found punishment to be less effective at changing behaviour than reinforcement

Effective conditioning must be contingent and contiguent.

Contingent means that there is a clear link between the person’s behaviour and the consequence it produces – they know exactly what they are being rewarded or punished for.
Contiguent means that the consequence follows soon after the behaviour – if there’s too long a delay, the conditioning is weakened.

the 'four quadrants' of operant conditioning

Notice that punishment doesn’t help achieve the desired behaviour – it just makes the undesired behaviour less likely. If it is the only attention someone gets, punishment may actually work as positive reinforcement, since a naughty boy might regard the attention he is getting from a parent or teacher as a reward, even if the attention consists of criticisms.

Skinner’s research suggests that reinforcement shapes behaviour better than punishment, and positive reinforcement shapes it better than negative reinforcement.

Despite what Sheldon tells Leonard, Skinner found negative reinforcement to be LESS effective than positive reinforcement at changing behaviour

RESEARCH INTO OPERANT CONDITIONING
THE FINDINGS OF STUDIES

Skinner carried out research on animals, famously in rats. He placed the animals in a “Skinner box” which contained a lever, a light and a food dispenser. If the rat pressed the lever, the light came on and a food pellet rolled down the chute. This is positive reinforcement. At first the rat would press the lever accidentally.

However, the consequence was contiguous (the food was dispensed instantly) and contingent (the light coming on alerted the rat to what it had done). Rats quickly learned to press the lever to get food.

One light tells the rat that food is ready (the ANTECEDENT) and another tells the rat that food has arrived (the CONSENQUENCE)

In a variation on this, Skinner electrified the floor of the Skinner Box and arranged for pressing the lever to turn the electric current off for 30 seconds. This shows negative reinforcement since the rat is learning to remove something painful. Skinner found that the rats learned to press the lever, but not as quickly as the rats that were positively reinforced.

Skinner (1948) carried out a famous experiment called “Superstition in the Pigeon”. Eight pigeons were starved to make them hungry then put in a cage. At regular intervals every 15 seconds, a food dispenser would swing into the cage for 5 seconds then swing out again. When the food was due to appear, the pigeons started showing strange behaviours, such as turning anticlockwise who making swaying motions.

Skinner concluded the pigeons were repeating whatever behaviour they had been in the middle of doing when the reinforcement was first offered to them. Because the food kept reappearing, this senseless behaviour was strengthened. This is like a “superstition” when humans imagine that, by doing something senseless (knocking on wood, crossing their fingers) they can make something pleasant happen.

Click here for more on the pigeon study

SCHEDULES OF REINFORCEMENT

A lot of Skinner’s research was how often a reward needs to happen before behaviour is learned. He discovered four “schedules” that work

Fixed interval: The reward turns up at a regular time. Desirable behaviour increases in the run-up to the reward. This happened with Skinner’s pigeons. It might happen with humans at work if there is a regular tea break or “casual Friday”. Learning is medium and extinction (learned behaviour fading) is medium.
Variable Interval: The reward turns up but you can’t be sure exactly when. An example might be the audience applauding a performer or cheering an athlete. Desirable behaviour increases more slowly but stays at a steady rate. Learning is fast but extinction is slow.
Fixed Ratio: The reward turns up every time the desired behaviour is carried out so often. Skinner’s rats got a reward every time they pressed the lever. A human might get paid for every 100 products they build. If you don’t do the behaviour, you get nothing; if you work fast, you get a lot. Learning is fast and extinction is moderate.
Variable Ratio: The reward is dispensed randomly, after a changing number of behaviours, such as feeding the rat after one lever-press, then after 5, then after 3. For humans, this might be like a slot machine because you don’t know how many times you’ll have to pay in before it pays out. Learning is fast and extinction is slow.

APPLYING OPERANT CONDITIONING (AO2)
CONDITIONING in the real world

Shaping Behaviour

Skinner also looked into shaping behaviour. Shaping involves changing the reinforcement to produce very precise behaviours.

At first, you reward any behaviour in the general direction you want
Later, you reward behaviours that are similar to the specific behaviour you have in mind
Eventually, you only reward the specific behaviour you are looking for

This is often used to train animals who appear in TV and films, like Uggie, the dog in the film The Artist that won an Oscar.

Phobias

Phobias can be explained by Operant Conditioning in a number of ways. If the feared thing is removed when you scream and cry, then fearful behaviour is negatively reinforced (it removes something unpleasant). If other people show concern, share their own fears or even just pay attention, then fear is positively reinforced too (it adds something pleasant).

This idea of shaping also appears in systematic desensitisation. If someone has a phobia of spiders, you might reward them at first for looking at pictures of spiders, then at a spider in the same room but far away, and eventually for handling a spider. This is why systematic desensitisation uses Classical AND Operant Conditioning.

Token Economy Programmes

TEPs are a treatment based on Operant Conditioning. They involve rewarding (and perhaps punishing) people by awarding or deducting tokens. The tokens may be vouchers or plastic chips. Tokens are only secondary reinforcers; when there are enough of them, they can be ‘cashed in’ for a positive reinforcer, like gifts, luxuries or privileges.

TEPs are often used in schools (“House Points”) but are also successful in prisons and workplaces. They are used in a clinical setting to help people overcome addictions or resist antisocial behaviour.

For a TEP to be effective, there must be a well-known list of what behaviours are rewarded and how many tokens they are worth. In a clinical setting, the patient might help design this list; in a prison or school, it might be chosen by the authorities to bring about the behaviour they desire. There also needs to be a well-known ‘exchange rate’ of what can be bought with tokens. Finally, staff need to be trained to award the tokens consistently and fairly.

Prof. Dumbledore completely ruins Hogwarts' House Point system by changing the conditions for receiving tokens after the behaviour.

EVALUATING OPERANT CONDITIONING (AO3)
CODA

Credibility

There’s a lot of research in support of the Operant Conditioning, including the study by Skinner (1948) into pigeons. This research isn’t just from the start of the 20th century; it continues to the present day. Brain imaging has identified “reward centres” in the brain that activate during positive reinforcement – these are linked to the brain’s motivational centres.

Moreover, a lot of this research is strictly scientific, being carried out on animals in lab conditions or using brain imaging techniques like MRI. Because the theory only looks at behaviours (rather than cognitions), every step in the conditioning process is observable. This adds to the credibility of the theory, since you can see it happen with your own eyes.

Token Economy Programmes became popular in the 1970s and are proven to work. For example, Hobbs & Holt (1976) showed that TEPs work to reduce antisocial behaviour in a juvenile detention centre.

Objections

Although research on rats and pigeons shows conditioning taking place, generalising the conclusions to human learning is not so clear-cut. For one thing, there are other learning theories – Classical Conditioning and Social Learning Theory – and it is usually difficult to tell whether one or the other is largely responsible when something is learned. For example, a phobia may be formed through association AND because the consequences were unpleasant – such as when someone is bitten by a dog and develops a fear of dogs.

The theory focuses entirely on the nurture side of the nature/nurture debate. It is possible some people are born with predispositions towards behaviours, rather than learning them through conditioning. This might explain why some people turn to crime or develop musical talent without being reinforced.

The theory also focuses entirely on behaviours and ignores cognitions. Cognitions are thought-processes and include things like personality, willpower and motivation. Sigmund Freud argued that a lot of self-destructive behaviour comes from hidden thought-processes in the unconscious mind and are not learned and cannot be un-learned so easily.

Differences

Operant Conditioning has many similarities with Classical Conditioning. Both were based on lab studies done on animals – dogs for Pavlov, rats for Skinner. Both then generalise the conclusions about learning to human beings. Both of them have produced effective treatments for problem behaviours – aversion therapy and systematic desensitisation for Classical Conditioning.

Classical Conditioning explains the acquisition of involuntary behaviours, things that are “knee jerk reactions”. However, Operant Conditioning explains how behaviours are learned by their consequences and better explains more deliberate, voluntary behaviours.

Social Learning Theory is quite different from Operant Conditioning. For one thing, it includes cognitions as well as behaviours. SLT explains a child learning to talk by watching and imitating adults, whereas Operant Conditioning suggests the child needs to have each word or phrase rewarded with praise or attention; SLT seems more realistic, because children learn to speak quickly and their parents don’t pay attention to everything they say.

However, Operant Conditioning and SLT overlap in vicarious learning. This is where a person sees a role model being reinforced for their behaviour; you are MUCH more likely to imitate behaviour you see being reinforced. This combines imitation (SLT) and reinforcement (Operant Conditioning).

Applications

Operant Conditioning has always had huge applications for therapy, especially the treatment of more deliberate problems like addictions and crime.

Systematic desensitisation works by positively reinforcing early behaviours in the direction of the feared object, then gradually shaping behaviour through positive reinforcement. Eventually, the patient will be able to handle something they used to have a phobia about.

Token Economy Programmes (TEPs) use positive reinforcement to modify behaviour in a closed setting like a school, hospital or prison. They may also be used with addicts or mental health patients, so long as the patient agrees to the TEP and has a say in what the tokens are acquired for and what they can be spent on. Mestel & Concar (1994) reported a successful programme to reward cocaine addicts who stayed ‘clean’ with vouchers for local shops.

EXEMPLAR ESSAY
How to write a 8-mark answer

Evaluate Operant Conditioning as a theory of learning. (8 marks)

A 8-mark “evaluate” question awards 4 marks for AO1 (Describe) and 4 marks for AO3 (Evaluate).

Description
B F Skinner proposed that learning is done through reinforcement. Repeated reinforcement leads to conditioning in his studies involving rats and pigeons.
Positive reinforcement adds something pleasant whenever you carry out the desired behaviour; negative reinforcement takes away something unpleasant.
It is important that the reinforcement is contingent (clearly linked to the desired behaviour) and continguent (taking place soon after the desired behaviour).
Operant Conditioning is on the nurture side of the nature/nurture debate because it suggests that all behaviour comes from reinforcement rather than innate predispositions.

Evaluation
Operant Conditioning is supported by lab research on animals, such as Skinner’s studies on rats that learned to press levers when rewarded with food.
This research seems to generalise to humans. Token Economy Programmes have improved the behaviour of prisoners according to Hobbs & Holt (1976).
The theory ignores the nature side of the nature/nurture debate, since people may be born with certain predispositions that don’t react to reward.
Operant Conditioning might also be too simplistic since it ignores motives and personality. There is a cognitive side to human behaviour that is recognised by Social Learning Theory instead.

Conclusion
Operant Conditioning explains a lot of voluntary behaviour and it has led to therapies like Token Economy Programmes that have helped people with addictions. However, it’s not a complete explanation of why people behave the way that they do.

Apply Operant Conditioning.

A 4-mark “apply” question awards 4 marks for AO2 (Application) and gives you a piece of stimulus material.

The Governor of Markdale prison has recently had problems in managing the behaviour of the prisoners. The prison service has recommended using token economy programmes (TEP) as a technique to control behaviour.

Explain why a token economy programme might be a good idea at Markdale Prison. (4 marks)

TEPs are based on Operant Conditioning which is a credible psychological theory backed up by a lot of research.
If the prisoners receive a token when they behave well (like keeping their cell clean), this is positive reinforcement and they will do it again.
The prisoners could exchange the tokens for family visits or time out of their cell, which would be primary reinforcement because these are basic needs.
A study by Hobbs & Holt (1976) showed that TEP reduced bad behaviour when it was used in a youth detention centre.

SKINNER (1955)OPERANT CONDITIONING EXPLAINS LEARNING BY CONSEQUENCES

LEARNING BY CONSEQUENCES

the 'four quadrants' of operant conditioning

RESEARCH INTO OPERANT CONDITIONINGTHE FINDINGS OF STUDIES

SCHEDULES OF REINFORCEMENT

APPLYING OPERANT CONDITIONING (AO2)CONDITIONING in the real world

EVALUATING OPERANT CONDITIONING (AO3)CODA

EXEMPLAR ESSAYHow to write a 8-mark answer

SKINNER (1955)
OPERANT CONDITIONING EXPLAINS LEARNING BY CONSEQUENCES

RESEARCH INTO OPERANT CONDITIONING
THE FINDINGS OF STUDIES

APPLYING OPERANT CONDITIONING (AO2)
CONDITIONING in the real world

EVALUATING OPERANT CONDITIONING (AO3)
CODA

EXEMPLAR ESSAY
How to write a 8-mark answer