Learning from a Learner

Grid words

To reproduce results for experiment 6.1 (table 1) run

python soft_policy_inversion.py

To reproduce results for experiment 6.1 (table 1) run

python trajectory_spi.py

Paper results where obtained with mujoco_py version 1.50.1

Learning agents are trained via Proximal Policy Optimization (PPO).

PPO and LfL code is based on Pytorch for gradient differentiation.

We adapted the PPO implementation by Ilya Kostrikov, available at https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail.

To reproduce results for experiment 6.1:

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
lfl		lfl
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py