Reinforcement Learning Explained: How AI Learns Like Humans (Trial and Error)

In the world of AI and machine learning, there are many ways to ‘teach’ a computer. There’s Supervised Learning, where we give it millions of labeled examples (this is a picture of a cat, this is a picture of a dog). There’s also Unsupervised Learning, where we let the AI find its own patterns from random data. But, there’s another method that I think is the coolest and most similar to how humans learn: Reinforcement Learning (RL).

RL is about learning from experience, from trial and error. Just like when we were kids learning to ride a bike. No one gave us a manual, right? We tried pedaling, fell (got negative feedback), tried again, adjusted our balance, fell again, until finally we could ride straight (got positive feedback). Our brains automatically learned which actions produced a ‘reward’ (being balanced) and which actions produced a ‘punishment’ (falling).

This is the basic principle that Reinforcement Learning tries to emulate.

Key Components in Reinforcement Learning

To avoid confusion, imagine we’re teaching a virtual dog to fetch a stick. In the RL world, there are several main players:

  • Agent: This is the ‘learner’. In our example, the agent is the virtual dog.
  • Environment: This is the ‘world’ where the agent interacts. It can be a game, a simulation, or even the real world. Here, the environment is a room with a stick.
  • State: This is the current situation or condition. For example, the state could be the position of the dog and the position of the stick in the room.
  • Action: This is an action that can be taken by the agent. Examples: run forward, turn left, grab the stick.
  • Reward: This is the feedback the agent receives after performing an action. It can be positive (reward) or negative (punishment). For example, if the dog successfully grabs the stick, we give it a reward of +100. If it hits a wall, we give it a reward of -10. If it just stands still, the reward is -1 (to motivate it to move).

The main goal of the agent is to learn a ‘strategy’ (called a Policy) that can maximize the total reward it collects over time. At first, the virtual dog doesn’t know anything. It will move randomly. It hits a wall (reward -10), ouch that hurts. It runs away from the stick (reward -1), bored. Hey, it accidentally runs towards the stick and grabs it (reward +100), wow, happy! Through thousands or millions of trials, this dog will learn that “approaching and grabbing the stick is the action that produces the highest reward”.

Where is RL Used?

This cool learning method isn’t just for teaching virtual dogs. Its applications in the real world are incredibly sophisticated. You’ll probably like this too, because some of them have already beaten humans in their field:

  • Playing Games: This is the most popular ‘laboratory’ for RL. Remember AlphaGo from DeepMind that beat the world champion of Go, Lee Sedol? AlphaGo learned to play Go purely by playing against itself millions of times. It found strategies that humans had never thought of for thousands of years. The same applies to AIs that are good at playing complex games like Dota 2 (OpenAI Five) or Starcraft II.
  • Robotics: How do you teach a robot arm to pick up objects of random shapes? It’s very difficult if programmed manually. With RL, the robot arm can learn itself through trial and error in a simulated environment. It will try thousands of ways to hold something, and finally find the most efficient way.
  • Self-Driving Cars: Autonomous cars can use RL to learn to make complex decisions on the road, such as when to overtake or how to merge onto a highway smoothly, with the goal of maximizing ‘rewards’ in the form of safety and time efficiency.
  • Content Recommendations: Recommendation systems on YouTube or Netflix can use RL to learn to give you the next video. If you watch a recommended video to the end (positive reward), the system will learn to give similar recommendations. If you skip in the first 5 seconds (negative reward), the system learns not to recommend videos like that again.

Challenges Ahead

RL sounds like a solution to all problems, right? But the challenges are also great. This method requires a huge amount of data (experience). AlphaGo needed to play more games than the entire human race combined. The trial-and-error process in the real world can also be expensive and dangerous. You can’t let a self-driving car learn by crashing millions of times on real roads, right? Therefore, most RL training is done in super realistic simulation environments.

Even so, Reinforcement Learning remains one of the most promising pillars in the future of AI. Its ability to learn independently and find creative solutions beyond human thinking is the key to creating true artificial intelligence.

Leave a Comment

ID | EN