Implementation of Twin Delayed Deep Deterministic Policy Gradient (TD3) and Proximal Policy Optimization (PPO) for continuous control tasks in the InvertedPendulumBulletEnv-v0 and HalfCheetahBulletEnv-v0 environments.
Implementation of TD3 with behavioral cloning (TD3+BC) for offline reinforcement learning.