Mixed double precision for PPO algorithm #155

lopatovsky · 2024-06-10T16:38:07Z

Mixed precision

Motivation:

Inspired by RLGames, we implemented automatic mixed double precision to boost performance of PPO.

Sources:

https://pytorch.org/docs/stable/amp.html

Speed eval:


Library	Mixed-Precision	Time (s)	slowing factor Base: rlgames, mixed pr. = True
RLGames	No	448	1.322x
RLGames	Yes	339	1 (base)
SKRL	No	475	1.401x
SKRL	Yes	373	1.1x
SKRL	Yes *	358	1.056x

* in this run mixed precision was used also for inference during data collection phase

Quality eval:

We trained a policy for our task with each of the configurations multiple times. We didn’t observe any statistically significant difference in quality of the final results.

lopatovsky added 3 commits June 10, 2024 18:35

Add mixed precision option into ppo algorithm

696a9f0

Expand mixed precision to forward passes during data sampling phase

3d5ba05

Merge with main

06fbf2e

lopatovsky force-pushed the ll_mixed_precision branch from 9c98abe to 06fbf2e Compare July 15, 2024 11:17

lopatovsky changed the base branch from main to develop July 15, 2024 11:23

Merge branch 'develop' into ll_mixed_precision

4567e80