Skip to content

Reinforcement Learning: Modification of Q-learning through the use DynaQ learning and Double-Q learning.

Notifications You must be signed in to change notification settings

grahamdavies15/RaceTrackDoubleDynaQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DynaQ Learning and Double Q Learning

I implemented Double Q-Learning and the DynaQ algorithm. DynaQ combines model-based planning with real experience. By simulating transitions, it learns faster as seen in the comparative graphs. Double Q-learning mitigates the maximization bias by having two separate q-tables, which one randomly chooses a table to select an action from. By minimising bias, it improves the overall model performance.

image

After 30 episodes, the agent achieved returns similar to what Q-learning reached in 150 episodes, indicating significantly faster learning. Additionally, it achieved higher average returns, showing improved overall performance. The increase in learning speed can be predominantly attributed to increased exploration in model planning from DynaQ, while double q-learning maintained efficiency by avoiding maximization bias.

image

The environment was the racetrack environment provided by Joshua Evans:

image

References:

Racetrack environment code by Dr Joshua Evans (racetrack_env.py)

Basic Q-learning returns plot by Dr Joshua Evans (correct_returns_q.json)

Off-policy TD Control Algorithm (Reinforcement Learning, Sutton & Barto, 2018, Section 6.5 p.131)

Double Q-learning Algorithm (Reinforcement Learning, Sutton & Barto, 2018, Section 6.7 p.136)

Tabular Dyna-Q Algorithm (Reinforcement Learning, Sutton & Barto, 2018, Section 8.2 p.164)

About

Reinforcement Learning: Modification of Q-learning through the use DynaQ learning and Double-Q learning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages