Add test for GAE + rename `RolloutBuffer.dones` for clarification #375

araffin · 2021-03-31T18:22:20Z

Description

Added a test for GAE computation
Add comment for return computation
Renamed dones for clarity

Motivation and Context

I have raised an issue to propose this change (required for new features and bug fixes)

see #105

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)
I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

araffin · 2021-03-31T18:35:05Z

Note: after some quick trials on CartPole-v1, the formulation self.returns = self.advantages + self.values seems to give me consistently better results.

Need to be confirmed with some Breakout/Pong run I think (or any harder env, maybe HalfCheetahBullet would do the trick)

EDIT: this is kind of confirmed by the error in CI "AssertionError: Mean reward below threshold: 87.60 < 90.00"

Miffyli · 2021-04-02T21:20:31Z

Could you provide references on what would be the "classical" way of computing returns you mention? While yes, this is not what you usually see, the one used inside GAE computation should be theoretically valid (e.g. if you set gae_lambda to one far end, you get the normal, discounted return).

stable_baselines3/common/base_class.py

docs/misc/changelog.rst

araffin · 2021-04-02T22:19:08Z

Could you provide references on what would be the "classical" way of computing returns you mention?

GAE paper: equation 28
https://arxiv.org/abs/1506.02438
Sutton Barto book: "9.3. STOCHASTIC-GRADIENT AND SEMI-GRADIENT METHODS": both MC and TD(0) estimates are shown

I guess using the GAE(lambda) + Value ends up being TD(lambda) target, so it should be fine but need a comment, no?
(but still a bit weird...)

araffin · 2021-04-02T22:39:00Z

Looking at the original commit (openai/baselines@da99706), the comment says "Compute target value using TD(lambda) estimator, and advantage with GAE(lambda)", so yes it is using TD(lambda) and not MC for value estimation.

The line of code:
https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/ppo1/pposgd_simple.py#L66

Miffyli · 2021-04-02T22:49:41Z

@araffin

That is very good to know, and also good to know your TD(0) experiment (as I understood it) had lower results (which was to be expected, as I understood TD(lambda) should work better) 👍 . A comment on this would make things clear, and maybe even clarify that this is used because it was used by openai baselines and SB2.

Edit: I mixed TD(0) and TD(1) up, see below.

araffin · 2021-04-02T22:53:02Z

That is very good to know, and also good to know your TD(0) experiment (as I understood it)

I tested MC estimates (as it is the most common way usually)

that this is used because it was used by openai baselines and SB2.

this is in the docstring but not so clear. I will update it ;)

araffin · 2021-04-04T16:21:21Z

So, I took some time and found the answer in David Silver lecture 4, slide 47 (page 51 in the PDF) "Telescoping in TD(λ)"), it comes from an non-intuitive telescope: https://www.davidsilver.uk/wp-content/uploads/2020/03/MC-TD.pdf

He shows that $G(\lambda) - V(s)$ leads to the GAE formula.

That is very good to know, and also good to know your TD(0) experiment (as I understood it) had lower results (which was to be expected, as I understood TD(lambda) should work better

TD(1) is MC estimate (non-intuitive when looking at the definition)
TD(0) is one-step estimate with bootstrapping

Edit: from John Schulman tutorial slide 19 (http://joschu.net/docs/2016-NIPS-Tutorial.pdf):

Miffyli · 2021-04-05T19:12:49Z

Added some clarifications to docs. Two major comments:

The GAE-test is to ensure GAE-computation works out correctly, right? Two worries here:
a) What is the implementation in test based on (i.e. can we trust it?)
b) I tried changing the code to compute GAE over three episodes (one rollout lasts for max_steps * 3). Only first two tests pass, but remaining do not. I figure this should pass if the implementations were correct?
Should dones in compute_returns_and_advantages arguments be renamed to last_episode_starts? This is to clarify dones and self.episode_starts have same meaning, but sadly this mean updating code around the repository to reflect this change.

Sorry for nitpicking. This has been such a headache I want to get it absolutely right this time so it can be left in peace for now :)

araffin · 2021-04-05T19:26:26Z

a) What is the implementation in test based on (i.e. can we trust it?)

Different things. As you noticed, I took the idea from #105, so that if there is an off-by-one error, the test will fail (I did some testing trying to re-introduce the bug and it failed as expected).

Then, I did some unit testing (GAE(1) is MC estimate) while writing the test (+ manual debugging).
For the lambda-return implementation, I based it on SB3... mostly because we did a lot of testing with that code (and if it is wrong, well, it is wrong but it works ^^").

b) I tried changing the code to compute GAE over three episodes (one rollout lasts for max_steps * 3). Only first two tests pass, but remaining do not. I figure this should pass if the implementations were correct?

Good point. I think I did not write the test for more than one episode, and in fact, it should probably fail without modification (that was to keep things simple and easy to monitor).

hould dones in compute_returns_and_advantages arguments be renamed to last_episode_starts?

I did not rename it because I think it has a different role compared to the episode start (is the very last step of the rollout terminal?), and it is used only once (I think I wrote that in the docstring).

Miffyli · 2021-04-05T19:31:43Z

Good point. I think I did not write the test for more than one episode, and in fact, it should probably fail without modification (that was to keep things simple and easy to monitor).

Since main code should support this, I think the test should also check for this. In addition, I think it should also test other reward models than "last step rewards 1.0" (e.g. "first step rewards 1.0" and "all steps reward 1.0"), to cover edge-cases and most common cases. I have framework for this and multi-episode test (which fail), which I can push if you'd like.

I did not rename it because I think it has a different role compared to the episode start (is the very last step of the rollout terminal?), and it is used only once (I think I wrote that in the docstring).

Ah right, now I see it 👍 . You are right they have different meanings. Lets leave it as is.

araffin · 2021-04-05T20:01:25Z

I have framework for this and multi-episode test (which fail), which I can push if you'd like.

I would prefer if you push tests that pass :p

Miffyli · 2021-04-15T23:17:20Z

Sorry for taking this long! I have added a test for multi-episode rollout that passes. If it checks out, feel free to accept and merge 👍

…ines3 into feat/rename-gae-done

* Start refactoring HER * Fixes * Additional fixes * Faster tests * WIP: HER as a custom replay buffer * New replay only version (working with DQN) * Add support for all off-policy algorithms * Fix saving/loading * Remove ObsDictWrapper and add VecNormalize tests with dict * Stable-Baselines3 v1.0 (#354) * Bump version and update doc * Fix name * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update docs/index.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording for RL zoo Co-authored-by: Adam Gleave <adam@gleave.me> * Add gym-pybullet-drones project (#358) * Update projects.rst Added gym-pybullet-drones * Update projects.rst Longer title underline * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> * Include SuperSuit in projects (#359) * include supersuit * longer title underline * Update changelog.rst * Fix default arguments + add bugbear (#363) * Fix potential bug + add bug bear * Remove unused variables * Minor: version bump * Add code of conduct + update doc (#373) * Add code of conduct * Fix DQN doc example * Update doc (channel-last/first) * Apply suggestions from code review Co-authored-by: Anssi <kaneran21@hotmail.com> * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> * Make installation command compatible with ZSH (#376) * Add quotes * Add Zsh bracket info * Add clarify pip installation line * Make note bold * Add Zsh pip installation note * Add handle timeouts param * Fixes * Fixes (buffer size, extend test) * Fix `max_episode_length` redefinition * Fix potential issue * Add some docs on dict obs * Fix performance bug * Fix slowdown * Add package to install (#378) * Add package to install * Update docs packages installation command Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Fix backward compat + add test * Fix VecEnv detection * Update doc * Fix vec env check * Support for `VecMonitor` for gym3-style environments (#311) * add vectorized monitor * auto format of the code * add documentation and VecExtractDictObs * refactor and add test cases * add test cases and format * avoid circular import and fix doc * fix type * fix type * oops * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * add test cases * update changelog * fix mutable argument * quick fix * Apply suggestions from code review * fix terminal observation for gym3 envs * delete comment * Update doc and bump version * Add warning when already using `Monitor` wrapper * Update vecmonitor tests * Fixes Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Reformat * Fixed loading of ``ent_coef`` for ``SAC`` and ``TQC``, it was not optimized anymore (#392) * Fix ent coef loading bug * Add test * Add comment * Reuse save path * Add test for GAE + rename `RolloutBuffer.dones` for clarification (#375) * Fix return computation + add test for GAE * Rename `last_dones` to `episode_starts` for clarification * Revert advantage * Cleanup test * Rename variable * Clarify return computation * Clarify docs * Add multi-episode rollout test * Reformat Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> * Fixed saving of `A2C` and `PPO` policy when using gSDE (#401) * Improve doc and replay buffer loading * Add support for images * Fix doc * Update Procgen doc * Update changelog * Update docstrings Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com>

* First commit * Fixing missing refs from a quick merge from master * Reformat * Adding DictBuffers * Reformat * Minor reformat * added slow dict test. Added SACMultiInputPolicy for future. Added private static image transpose helper to common policy * Ran black on buffers * Ran isort * Adding StackedObservations classes used within VecStackEnvs wrappers. Made test_dict_env shorter and removed slow * Running isort :facepalm * Fixed typing issues * Adding docstrings and typing. Using util for moving data to device. * Fixed trailing commas * Fix types * Minor edits * Avoid duplicating code * Fix calls to parents * Adding assert to buffers. Updating changelong * Running format on buffers * Adding multi-input policies to dqn,td3,a2c. Fixing warnings. Fixed bug with DictReplayBuffer as Replay buffers use only 1 env * Fixing warnings, splitting is_vectorized_observation into multiple functions based on space type * Created envs folder in common. Updated imports. Moved stacked_obs to vec_env folder * Moved envs to envs directory. Moved stacked obs to vec_envs. Started update on documentation * Fixes * Running code style * Update docstrings on torch_layers * Decapitalize non-constant variables * Using NatureCNN architecture in combined extractor. Increasing img size in multi input env. Adding memory reduction in test * Update doc * Update doc * Fix format * Removing NineRoom env. Using nested preprocess. Removing mutable default args * running code style * Passing channel check through to stacked dict observations. * Running black * Adding channel control to SimpleMultiObsEnv. Passing check_channels to CombinedExtractor * Remove optimize memory for dict buffers * Update doc * Move identity env * Minor edits + bump version * Update doc * Fix doc build * Bug fixes + add support for more type of dict env * Fixes + add multi env test * Add support for vectranspose * Fix stacked obs for dict and add tests * Add check for nested spaces. Fix dict-subprocvecenv test * Fix (single) pytype error * Simplify CombinedExtractor * Fix tests * Fix check * Merge branch 'master' into feat/dict_observations * Fix for net_arch with dict and vector obs * Fixes * Add consistency test * Update env checker * Add some docs on dict obs * Update default CNN feature vector size * Refactor HER (#351) * Start refactoring HER * Fixes * Additional fixes * Faster tests * WIP: HER as a custom replay buffer * New replay only version (working with DQN) * Add support for all off-policy algorithms * Fix saving/loading * Remove ObsDictWrapper and add VecNormalize tests with dict * Stable-Baselines3 v1.0 (#354) * Bump version and update doc * Fix name * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update docs/index.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording for RL zoo Co-authored-by: Adam Gleave <adam@gleave.me> * Add gym-pybullet-drones project (#358) * Update projects.rst Added gym-pybullet-drones * Update projects.rst Longer title underline * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> * Include SuperSuit in projects (#359) * include supersuit * longer title underline * Update changelog.rst * Fix default arguments + add bugbear (#363) * Fix potential bug + add bug bear * Remove unused variables * Minor: version bump * Add code of conduct + update doc (#373) * Add code of conduct * Fix DQN doc example * Update doc (channel-last/first) * Apply suggestions from code review Co-authored-by: Anssi <kaneran21@hotmail.com> * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> * Make installation command compatible with ZSH (#376) * Add quotes * Add Zsh bracket info * Add clarify pip installation line * Make note bold * Add Zsh pip installation note * Add handle timeouts param * Fixes * Fixes (buffer size, extend test) * Fix `max_episode_length` redefinition * Fix potential issue * Add some docs on dict obs * Fix performance bug * Fix slowdown * Add package to install (#378) * Add package to install * Update docs packages installation command Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Fix backward compat + add test * Fix VecEnv detection * Update doc * Fix vec env check * Support for `VecMonitor` for gym3-style environments (#311) * add vectorized monitor * auto format of the code * add documentation and VecExtractDictObs * refactor and add test cases * add test cases and format * avoid circular import and fix doc * fix type * fix type * oops * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * add test cases * update changelog * fix mutable argument * quick fix * Apply suggestions from code review * fix terminal observation for gym3 envs * delete comment * Update doc and bump version * Add warning when already using `Monitor` wrapper * Update vecmonitor tests * Fixes Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Reformat * Fixed loading of ``ent_coef`` for ``SAC`` and ``TQC``, it was not optimized anymore (#392) * Fix ent coef loading bug * Add test * Add comment * Reuse save path * Add test for GAE + rename `RolloutBuffer.dones` for clarification (#375) * Fix return computation + add test for GAE * Rename `last_dones` to `episode_starts` for clarification * Revert advantage * Cleanup test * Rename variable * Clarify return computation * Clarify docs * Add multi-episode rollout test * Reformat Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> * Fixed saving of `A2C` and `PPO` policy when using gSDE (#401) * Improve doc and replay buffer loading * Add support for images * Fix doc * Update Procgen doc * Update changelog * Update docstrings Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com> * Update doc and minor fixes * Update doc * Added note about MultiInputPolicy in error of NatureCNN * Merge branch 'master' into feat/dict_observations * Address comments * Naming clarifications * Actually saving the file would be nice * Fix edge case when doing online sampling with HER * Cleanup * Add sanity check Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com>

…R-RM#375) * Fix return computation + add test for GAE * Rename `last_dones` to `episode_starts` for clarification * Revert advantage * Cleanup test * Rename variable * Clarify return computation * Clarify docs * Add multi-episode rollout test * Reformat Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com>

* First commit * Fixing missing refs from a quick merge from master * Reformat * Adding DictBuffers * Reformat * Minor reformat * added slow dict test. Added SACMultiInputPolicy for future. Added private static image transpose helper to common policy * Ran black on buffers * Ran isort * Adding StackedObservations classes used within VecStackEnvs wrappers. Made test_dict_env shorter and removed slow * Running isort :facepalm * Fixed typing issues * Adding docstrings and typing. Using util for moving data to device. * Fixed trailing commas * Fix types * Minor edits * Avoid duplicating code * Fix calls to parents * Adding assert to buffers. Updating changelong * Running format on buffers * Adding multi-input policies to dqn,td3,a2c. Fixing warnings. Fixed bug with DictReplayBuffer as Replay buffers use only 1 env * Fixing warnings, splitting is_vectorized_observation into multiple functions based on space type * Created envs folder in common. Updated imports. Moved stacked_obs to vec_env folder * Moved envs to envs directory. Moved stacked obs to vec_envs. Started update on documentation * Fixes * Running code style * Update docstrings on torch_layers * Decapitalize non-constant variables * Using NatureCNN architecture in combined extractor. Increasing img size in multi input env. Adding memory reduction in test * Update doc * Update doc * Fix format * Removing NineRoom env. Using nested preprocess. Removing mutable default args * running code style * Passing channel check through to stacked dict observations. * Running black * Adding channel control to SimpleMultiObsEnv. Passing check_channels to CombinedExtractor * Remove optimize memory for dict buffers * Update doc * Move identity env * Minor edits + bump version * Update doc * Fix doc build * Bug fixes + add support for more type of dict env * Fixes + add multi env test * Add support for vectranspose * Fix stacked obs for dict and add tests * Add check for nested spaces. Fix dict-subprocvecenv test * Fix (single) pytype error * Simplify CombinedExtractor * Fix tests * Fix check * Merge branch 'master' into feat/dict_observations * Fix for net_arch with dict and vector obs * Fixes * Add consistency test * Update env checker * Add some docs on dict obs * Update default CNN feature vector size * Refactor HER (DLR-RM#351) * Start refactoring HER * Fixes * Additional fixes * Faster tests * WIP: HER as a custom replay buffer * New replay only version (working with DQN) * Add support for all off-policy algorithms * Fix saving/loading * Remove ObsDictWrapper and add VecNormalize tests with dict * Stable-Baselines3 v1.0 (DLR-RM#354) * Bump version and update doc * Fix name * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> * Update docs/index.rst Co-authored-by: Adam Gleave <adam@gleave.me> * Update wording for RL zoo Co-authored-by: Adam Gleave <adam@gleave.me> * Add gym-pybullet-drones project (DLR-RM#358) * Update projects.rst Added gym-pybullet-drones * Update projects.rst Longer title underline * Update changelog Co-authored-by: Antonin Raffin <antonin.raffin@ensta.org> * Include SuperSuit in projects (DLR-RM#359) * include supersuit * longer title underline * Update changelog.rst * Fix default arguments + add bugbear (DLR-RM#363) * Fix potential bug + add bug bear * Remove unused variables * Minor: version bump * Add code of conduct + update doc (DLR-RM#373) * Add code of conduct * Fix DQN doc example * Update doc (channel-last/first) * Apply suggestions from code review Co-authored-by: Anssi <kaneran21@hotmail.com> * Apply suggestions from code review Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> * Make installation command compatible with ZSH (DLR-RM#376) * Add quotes * Add Zsh bracket info * Add clarify pip installation line * Make note bold * Add Zsh pip installation note * Add handle timeouts param * Fixes * Fixes (buffer size, extend test) * Fix `max_episode_length` redefinition * Fix potential issue * Add some docs on dict obs * Fix performance bug * Fix slowdown * Add package to install (DLR-RM#378) * Add package to install * Update docs packages installation command Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Fix backward compat + add test * Fix VecEnv detection * Update doc * Fix vec env check * Support for `VecMonitor` for gym3-style environments (DLR-RM#311) * add vectorized monitor * auto format of the code * add documentation and VecExtractDictObs * refactor and add test cases * add test cases and format * avoid circular import and fix doc * fix type * fix type * oops * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Update stable_baselines3/common/monitor.py Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * add test cases * update changelog * fix mutable argument * quick fix * Apply suggestions from code review * fix terminal observation for gym3 envs * delete comment * Update doc and bump version * Add warning when already using `Monitor` wrapper * Update vecmonitor tests * Fixes Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> * Reformat * Fixed loading of ``ent_coef`` for ``SAC`` and ``TQC``, it was not optimized anymore (DLR-RM#392) * Fix ent coef loading bug * Add test * Add comment * Reuse save path * Add test for GAE + rename `RolloutBuffer.dones` for clarification (DLR-RM#375) * Fix return computation + add test for GAE * Rename `last_dones` to `episode_starts` for clarification * Revert advantage * Cleanup test * Rename variable * Clarify return computation * Clarify docs * Add multi-episode rollout test * Reformat Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> * Fixed saving of `A2C` and `PPO` policy when using gSDE (DLR-RM#401) * Improve doc and replay buffer loading * Add support for images * Fix doc * Update Procgen doc * Update changelog * Update docstrings Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Anssi <kaneran21@hotmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com> * Update doc and minor fixes * Update doc * Added note about MultiInputPolicy in error of NatureCNN * Merge branch 'master' into feat/dict_observations * Address comments * Naming clarifications * Actually saving the file would be nice * Fix edge case when doing online sampling with HER * Cleanup * Add sanity check Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org> Co-authored-by: Anssi "Miffyli" Kanervisto <kaneran21@hotmail.com> Co-authored-by: Adam Gleave <adam@gleave.me> Co-authored-by: Jacopo Panerati <jacopo.panerati@utoronto.ca> Co-authored-by: Justin Terry <justinkterry@gmail.com> Co-authored-by: Tom Dörr <tomdoerr96@gmail.com> Co-authored-by: Tom Dörr <tom.doerr@tum.de> Co-authored-by: Costa Huang <costa.huang@outlook.com>

araffin added 2 commits March 31, 2021 20:12

Fix return computation + add test for GAE

9928d8c

Rename last_dones to episode_starts for clarification

699da19

araffin requested a review from Miffyli March 31, 2021 18:22

araffin changed the title ~~Fix GAE + add test + rename RolloutBuffer.dones for clarification~~ Add test for GAE + rename RolloutBuffer.dones for clarification Mar 31, 2021

araffin requested a review from AdamGleave March 31, 2021 18:41

araffin added 3 commits March 31, 2021 21:17

Revert advantage

ad3549c

Cleanup test

3df2982

Merge branch 'master' into feat/rename-gae-done

95bf7dc

Miffyli reviewed Apr 2, 2021

View reviewed changes

stable_baselines3/common/base_class.py Outdated Show resolved Hide resolved

Miffyli reviewed Apr 2, 2021

View reviewed changes

docs/misc/changelog.rst Outdated Show resolved Hide resolved

araffin and others added 3 commits April 4, 2021 18:46

Rename variable

a230e11

Clarify return computation

dc57482

Clarify docs

24ea677

araffin and others added 2 commits April 10, 2021 14:05

Merge branch 'master' into feat/rename-gae-done

d550e5a

Add multi-episode rollout test

ff1df3e

araffin added 3 commits April 16, 2021 11:31

Merge branch 'master' into feat/rename-gae-done

51b531f

Merge branch 'feat/rename-gae-done' of github.com:DLR-RM/stable-basel…

b94c286

…ines3 into feat/rename-gae-done

Reformat

bceb914

araffin merged commit 5d47296 into master Apr 16, 2021

araffin deleted the feat/rename-gae-done branch April 16, 2021 13:52

araffin mentioned this pull request May 10, 2021

PPO variant with invalid action masking Stable-Baselines-Team/stable-baselines3-contrib#25

Merged

15 tasks

araffin mentioned this pull request May 23, 2021

Why does compute_episodic_return uses returns + advantage? thu-ml/tianshou#372

Closed

zhihanyang2022 mentioned this pull request Nov 4, 2021

[Question] Does PPO handle timeout and bootstrap correctly? #651

Closed

2 tasks

This was referenced Mar 16, 2023

Update PPO for better performance and closer alignment to sb3 StoneT2000/rl-ts#38

Closed

Ppo patch StoneT2000/rl-ts#39

Merged

qgallouedec mentioned this pull request May 28, 2024

[Question] Relationship between n_step, episode, and advantage in episodic tasks #1938

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test for GAE + rename `RolloutBuffer.dones` for clarification #375

Add test for GAE + rename `RolloutBuffer.dones` for clarification #375

araffin commented Mar 31, 2021 •

edited

Loading

araffin commented Mar 31, 2021 •

edited

Loading

Miffyli commented Apr 2, 2021

araffin commented Apr 2, 2021 •

edited

Loading

araffin commented Apr 2, 2021

Miffyli commented Apr 2, 2021 •

edited

Loading

araffin commented Apr 2, 2021

araffin commented Apr 4, 2021 •

edited

Loading

Miffyli commented Apr 5, 2021 •

edited

Loading

araffin commented Apr 5, 2021

Miffyli commented Apr 5, 2021

araffin commented Apr 5, 2021

Miffyli commented Apr 15, 2021

Add test for GAE + rename RolloutBuffer.dones for clarification #375

Add test for GAE + rename RolloutBuffer.dones for clarification #375

Conversation

araffin commented Mar 31, 2021 • edited Loading

Description

Motivation and Context

Types of changes

Checklist:

araffin commented Mar 31, 2021 • edited Loading

Miffyli commented Apr 2, 2021

araffin commented Apr 2, 2021 • edited Loading

araffin commented Apr 2, 2021

Miffyli commented Apr 2, 2021 • edited Loading

araffin commented Apr 2, 2021

araffin commented Apr 4, 2021 • edited Loading

Miffyli commented Apr 5, 2021 • edited Loading

araffin commented Apr 5, 2021

Miffyli commented Apr 5, 2021

araffin commented Apr 5, 2021

Miffyli commented Apr 15, 2021

Add test for GAE + rename `RolloutBuffer.dones` for clarification #375

Add test for GAE + rename `RolloutBuffer.dones` for clarification #375

araffin commented Mar 31, 2021 •

edited

Loading

araffin commented Mar 31, 2021 •

edited

Loading

araffin commented Apr 2, 2021 •

edited

Loading

Miffyli commented Apr 2, 2021 •

edited

Loading

araffin commented Apr 4, 2021 •

edited

Loading

Miffyli commented Apr 5, 2021 •

edited

Loading