Skip to content
/ LiRE Public

Listwise Reward Estimation for Offline Preference-based Reinforcement Learning (ICML 2024)

Notifications You must be signed in to change notification settings

chwoong/LiRE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Listwise Reward Estimation for Offline Preference-based Reinforcement Learning

This is the official implementation of LiRE.

This repository contains offline RL dataset and scripts to reproduce experiments.

The code is based on

  • CORL library: Offline Reinforcement Learning library. This library provides single-file implementations of offline RL algorithms.
  • PEBBLE: online Preference-based Reinforcement learning code. We used the SAC implementation of this code to create new offline preference-based RL dataset.

Please visit our paper and project page for more details.

Installation

1. Install with conda env file (Click to expand)
  conda env create -f LiRE.yml
  pip install git+https://github.com/Farama-Foundation/Metaworld.git@master#egg=metaworld
  pip install git+https://github.com/denisyarats/dmc2gym.git
  pip install gdown
  sudo apt install unzip
2. Install with installation list (Click to expand)
  conda create -n LiRE python=3.9
  conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
  pip install "gym[mujoco_py,classic_control]==0.23.0"
  pip install pyrallis rich tqdm==4.64.0 wandb==0.12.21
  pip install git+https://github.com/denisyarats/dmc2gym.git
  pip install git+https://github.com/Farama-Foundation/Metaworld.git@master#egg=metaworld
  pip install gdown
  sudo apt install unzip
  • Trouble shooting (Click to expand)
    • AttributeError: module 'numpy' has no attribute 'int'
      • modify to dim = int(np.prod(s.shape)) from dim = np.int(np.prod(s.shape)) in .../LiRE/lib/python3.9/site-packages/dmc2gym/wrappers.py

Algorithms

In this repro, we can run MR, SeqRank, LiRE.

For other baselines, we experimented with the following repo:

Algorithms URL
PT https://github.com/csmile-1006/PreferenceTransformer
DPPO https://github.com/snu-mllab/DPPO
IPL https://github.com/jhejna/inverse-preference-learning

Dataset

For more details, please see here

  • MetaWorld
  • DMControl

Scripts

Please see here

About

Listwise Reward Estimation for Offline Preference-based Reinforcement Learning (ICML 2024)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published