Skip to content

EricLee8/CS489-ReinforcementLearning-Project

Repository files navigation

CS489 Reinforcement Learning: Project

Project Introduction

This is the code for the project of CS489: Reinforce Learning.
This project requires us to implement two kinds of model-free RL methods, which are value-based RL and policy-based RL. We should choose RL algorithms to solve two benchmark environments: Atari Game and Mujoco Robots, which are discrete space control and continuous space control, respectively.

Usage

To install the dependencies, run:

$ pip install -r requirements.txt

Note that gym[atari], tb-nightly future and mujoco_py can not be installed by running the above code, for the first two, run:

$ pip install gym[atari]

$ pip install tb-nightly future

For mujoco_py, you need to get a license.

After installing all dependencies, you can run my code to train the models as:

$ python run.py --env_name BreakoutNoFrameskip-v4

Note that my code only support the following 7 environments!

I also provide trained models to do the demo task or testing task, you can do the demo like:

$ python test.py --env_name BreakoutNoFrameskip-v4 --num_episode 10

Note that when num_episode=1, we will do rendering during test. Make sure you have a graphical interface to run this.

Methods

For Atari Games, I chose DQN with some optimization, such as Losing-life-stopping (especially works for Breakout) and Skip-frame. For Mujoco Robots, I firstly tried A3C and PPO (PPO2) but got bad results. Finally I used SAC (Soft Actor-Critic) and got better results.

Results

The results are presented as follows. Note that due to the shortage of time and computing resource, some environment should have reached better results. For example, after 3M steps, Humanoid-v2 model can still improve its performance if we continue to train it. But time is not enough for me to do that, so I stopped it at the point of 3M steps. (Update: Now I trained humanoid for 10M steps and it got a better result!)

Atari Games

Environment Name Average Testing Score Training Steps
BreakoutNoFrameskip-v4 416.4±38.6 10M
PongNoFrameskip-v4 20.7±0.5 10M
BoxingNoFrameskip-v4 96.3±3.1 10M

Mujoco Robots

Environment Name Average Testing Score Training Steps
Hopper-v2 4132.8±30.9 3M
Humanoid-v2 7304.6±24.7 10M
HalfCheetah-v2 15875.6±36.4 3M
Ant-v2 6978.2±75.1 3M

Training Reward Pictures

Atari Games

Mujoco Robots

Releases

No releases published

Packages

No packages published

Languages