Skip to content

juliusfrost/minerl-rllib

Repository files navigation

MineRL RLlib Benchmark

Here we benchmark various reinforcement learning algorithms available in RLlib on the MineRL environment.

RLlib is an open-source library for reinforcement learning that offers both high scalability and a unified API for a variety of applications. RLlib natively supports TensorFlow, TensorFlow Eager, and PyTorch, but most of its internals are framework agnostic.

Installation

Make sure you have JDK 1.8 on your system for MineRL

Requires Python 3.7 or 3.8.

Use a conda virtual environment

conda create --name minerl-rllib python=3.8
conda activate minerl-rllib

Install dependencies

pip install poetry
poetry install

Install PyTorch with correct cuda version.

How to Use

Data

Make sure you have the environment variable MINERL_DATA_ROOT set, otherwise it defaults to the data folder.

Downloading the MineRL dataset

Follow the official instructions: https://minerl.io/dataset/
If you download the data to ./data then you don't need to set MINERL_DATA_ROOT in your environment variables.

Training

Training is simple with just one command. Do python train.py --help to see all options.

python train.py -f path/to/config.yaml

For example, see the following command trains the SAC algorithm on offline data in the MineRLObtainDiamondVectorObf-v0 environment.

python train.py -f config/sac-offline.yaml

Configuration

This repository comes with a modular configuration system. We specify configuration yaml files according to the rllib specification. Read more about rllib config specification here. Check out the config/ directory for more example configs.

You can specify the minerl-wrappers configuration arguments with the env_config setting. Check here for other config options for different wrappers.

training-run-name:
  ...
  config:
    ...
    env: MineRLObtainDiamondVectorObf-v0
    env_config:
      # use diamond wrappers from minerl-wrappers
      diamond: true
      diamond_config:
        gray_scale: true
        frame_skip: 4
        frame_stack: 4
      # This repo-exclusive API discretizes the action space by calculating the kmeans actions
      # from the minerl dataset for the chosen env. Kmeans results are cached to data location.
      kmeans: true
      kmeans_config:
        num_actions: 30