Conditioning Algorithms: Reinforcement Learning - An Introduction
Updated: Jul 26, 2020
People have continuously wanted to create machines that can think, learn, and reason. The research within the field of artificial intelligence leads us all to the belief that we should look at algorithms and somewhat think they are comparable to our human ways of thinking and reasoning.
Let's look at reinforcement learning, for instance: It's a machine learning field concerned with how software agents should take actions in an environment to maximize rewards. It's one of the three paradigms in machine learning and builds on a well-known educational/psychological concept: conditioning.
Pavlov: Is Conditioning Dogs and Algorithms the Same?
Pavlov's behavioral learning theory (also known as classical conditioning) states that a new, conditional reaction can be added to a natural, mostly innate, so-called unconditional response through learning. A well-known example is the Pavlovian dog: when fed, a bell sounded at the same time. After a few such feedings, the dog's saliva began to flow just after the familiar bell sound. This led to operant conditioning, also called learning by success—these paradigms of behavioristic learning psychology and concern the learning of stimulus-response patterns from originally spontaneous behavior. The frequency of a behavior is changed permanently by its pleasant (appetitive) or unpleasant (aversive) consequences. This means that desirable behavior is reinforced by reward, and undesirable behavior is suppressed by punishment.
We either award the agent (reinforcement) or punish the agent for unwanted behavior (punishment). And based on that, there are two types of reinforcement learning methods:
Positive: It is characterized as an occasion that happens because of a particular behavior. It increases the quality and the recurrence of the behavior and impacts emphatically on the activity taken by the agent.
Negative: This scenario centers around the strengthening of action that occurs because of an adverse condition that has to stop or be avoided.
Reinforcement Learning Algorithms
As a neural network learning method, reinforcement learning algorithms should help us attain a complex objective or maximize a particular measurement over numerous steps. Typically reinforcement learning methods are applied in motion control scenarios, creating training systems that provide instructions and materials according to requirements. This method can also be used, as distinct from supervised learning, when there is not as much data available.
Reinforcement learning is applied in robotics for industrial automation and business strategy planning. It can be assumed that reinforcement learning algorithms perform better over time in more ambiguous, realistic environments when selecting an arbitrary number of possible actions. Some even believe that this type of algorithm is the most promising path to solve complex problems around strong AI, given enough data and calculations are available.
There is one major downside to the concept of conditioning: The study of learning through conditioning is strictly limited to observable behavior and does not speculate on constructs that may underlie the behavior. Therefore, it does not clarify how learning by intrinsic motivation (e.g., curiosity) works. The same is true for algorithms because they are rule-based, not rule-bound.
For this post, I have used the following resources:
Chris Nicholson's Guide (check out the extensive amount list with further resources such as articles, papers, videos, and links),
and my old psychology book.
Hope you've enjoyed this post - until next time and as always, stay curious!