Skip to content

Latest commit

 

History

History
35 lines (30 loc) · 1.99 KB

README.md

File metadata and controls

35 lines (30 loc) · 1.99 KB

Reinforcement Learning (RL) Project

Project Description

We employ the Actor-Critic, Reinforce with Baseline, and Episodic n-step SARSA algorithms to acquire an optimal policy for distinct Markov Decision Processes (MDPs), specifically, MountainCar-v0, Acrobot-v0, and CarPole-v1 from the OpenAI Gym library. Systematic experimentation has been conducted on both hyperparameters and model architecture, leading in the presentation of results for the most effective configuration.

Technical Skills

Python OpenAI Gym PyTorch Matplotlib Jupyter Notebook

Dependencies

OpenAI Gym
  !pip install gym
PyTorch (Check CPU/GPU Compatibility)
  https://pytorch.org/get-started/locally/
NumPy
  !pip install numpy
Matplotlib
  !pip install matplotlib

File Contents

  • Actor Critic Final.py
    • Contains the implementation of the Actor-Critic algorithm, a reinforcement learning technique combining policy (Actor) and value function (Critic) approximation to enhance learning efficiency.
  • REINFORCE with Baseline Final.py:
    • Encompasses the implementation of the REINFORCE algorithm with Baseline, a policy gradient method incorporating a baseline to reduce variance in gradient estimates.
  • Semi-Gradient-SARSA Final.py
    • Houses the implementation of the Semi-Gradient-SARSA algorithm, a temporal difference learning method applied in reinforcement learning scenarios for updating Q-values and optimizing policy.