Advantage Actor Critic (A2C) Model #598

blahBlahhhJ · 2021-03-19T04:29:11Z

What does this PR do?

Update for #596 (issue)
Implementation of A2C model for Reinforcement Learning

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests? [not needed for typos/docs]
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

pep8speaks · 2021-03-19T04:29:16Z

Hello @blahBlahhhJ! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-08-13 12:58:07 UTC

codecov · 2021-03-20T00:18:20Z

Codecov Report

Merging #598 (4687f9a) into master (2d7ae88) will decrease coverage by 47.31%.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##           master     #598       +/-   ##
===========================================
- Coverage   71.64%   24.32%   -47.32%     
===========================================
  Files         119      120        +1     
  Lines        7367     7486      +119     
===========================================
- Hits         5278     1821     -3457     
- Misses       2089     5665     +3576

Flag	Coverage Δ
cpu	`24.32% <0.00%> (-47.32%)`	⬇️
pytest	`24.32% <0.00%> (-47.32%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pl_bolts/models/rl/__init__.py	`0.00% <0.00%> (-100.00%)`	⬇️
pl_bolts/models/rl/advantage_actor_critic_model.py	`0.00% <0.00%> (ø)`
pl_bolts/models/rl/common/agents.py	`0.00% <0.00%> (-100.00%)`	⬇️
pl_bolts/models/rl/common/networks.py	`0.00% <0.00%> (-91.60%)`	⬇️
pl_bolts/models/rl/dueling_dqn_model.py	`0.00% <0.00%> (-100.00%)`	⬇️
...l_bolts/models/rl/vanilla_policy_gradient_model.py	`0.00% <0.00%> (-95.91%)`	⬇️
pl_bolts/models/rl/double_dqn_model.py	`0.00% <0.00%> (-95.84%)`	⬇️
pl_bolts/models/rl/reinforce_model.py	`0.00% <0.00%> (-89.40%)`	⬇️
... and 68 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d7ae88...4687f9a. Read the comment docs.

blahBlahhhJ · 2021-03-20T04:27:39Z

@akihironitta
Hi, I've finished the implementation for A2C, and tested the performance on OpenAI's CartPole environment (performance screenshot on the top). I think it's pretty good. However, I have a few questions regarding the PR.

In the 4th checklist item, does documentation means documentation in the code? If so, I think I wrote some amount of docs in my classes and methods.
In the 6th checklist item, regarding testing, I've wrote some tests for A2C, and all of them passed. But there are a few tests from other components (many from the data module section) that failed and I'm 100% sure I did not touch any of them. I assume the failure exist before I forked this repo. How do I deal with that?
For the last checklist item, I'm not sure if I should update the CHANGELOG, so I left it incomplete.

Let me know what I should do next for this PR. Thanks!

akihironitta

@blahBlahhhJ Hi, thank you for your contribution! The implementation looks great! I added a few commits directly to your branch to add its doc and to fix some minor issues. Also, I left some comments below, so would you mind having a look at them?

docs/source/reinforce_learn.rst

pl_bolts/models/rl/advantage_actor_critic_model.py

akihironitta · 2021-03-20T08:27:28Z

@blahBlahhhJ Thanks for the update!

The classes and methods were well documented! They were just not indexed in the docs, but I added them to the docs in 22f3b85. :)
You're right, the failing tests are irrelevant to the change in this PR. We will take care of that.
I updated the changelog for you.

for more information, see https://pre-commit.ci

…tning-bolts into feature/596_a2c

docs/source/reinforce_learn.rst

Borda · 2021-06-24T07:45:04Z

pl_bolts/models/rl/advantage_actor_critic_model.py

+        return batch[0][0][0].device.index if self.on_gpu else "cpu"
+
+    @staticmethod
+    def add_model_specific_args(arg_parser: ArgumentParser) -> ArgumentParser:


Consider using LightningCLI see: https://devblog.pytorchlightning.ai/auto-structuring-deep-learning-projects-with-the-lightning-cli-9f40f1ef8b36

Hi, I looked a bit into it, and It seems like in order to add the arguments (add_model_specific_args()) in the lightning module, I'll have to write my own subclass of LightningCLI, which defeats the purpose to use it to simplify the code. Let me know if my understanding is wrong. Or is there an example of using it in other RL algorithms?

the LightningCLI takes and maps what ever you have in Module init...

a2c draft

6f1afc9

github-actions bot added the model label Mar 19, 2021

blahBlahhhJ added 3 commits March 18, 2021 23:56

finish logic but not training

d6e6652

cli pass converge on cartpole environment

b9ee7e9

test by calling from package, fix code formatting, ready for review

9a3a309

blahBlahhhJ added 3 commits March 19, 2021 17:35

add tests, fix formatting

ed891bc

fix typo

415437b

fix tests, ready for review

47932be

blahBlahhhJ marked this pull request as ready for review March 20, 2021 04:21

blahBlahhhJ requested review from akihironitta, ananyahjha93 and Borda as code owners March 20, 2021 04:21

akihironitta added 2 commits March 20, 2021 16:04

Add A2C to __init__

f2b19c8

Update docs

22f3b85

blahBlahhhJ requested a review from edenlightning as a code owner March 20, 2021 07:19

akihironitta added 2 commits March 20, 2021 16:52

Fix formatting

8221035

Use self.hparams and remove n_steps

16bcd4a

akihironitta reviewed Mar 20, 2021

View reviewed changes

Update CHANGELOG

e2ffd14

akihironitta added this to In progress in Reinforcement Learning via automation Mar 20, 2021

blahBlahhhJ and others added 6 commits March 20, 2021 16:37

Merge branch 'master' into feature/596_a2c

a06528e

fix typing hints, add documentation for A2C

e397c47

minor formatting issue

245feb0

delete print and add normalization

9211f20

Adjust fig size

17fc418

Fix typing

b26b271

Merge branch 'master' into feature/596_a2c

53a5703

Borda requested review from awaelchli and ethanwharris as code owners June 24, 2021 07:38

mergify bot removed the has conflicts label Jun 24, 2021

pre-commit-ci bot and others added 3 commits June 24, 2021 07:39

[pre-commit.ci] auto fixes from pre-commit.com hooks

83f5cef

for more information, see https://pre-commit.ci

formt

fa64829

Merge branch 'feature/596_a2c' of https://github.com/blahBlahhhJ/ligh…

8e1c783

…tning-bolts into feature/596_a2c

Borda approved these changes Jun 24, 2021

View reviewed changes

Borda added the ready label Jun 24, 2021

Borda requested a review from akihironitta June 24, 2021 07:45

Borda and others added 17 commits June 24, 2021 09:45

Apply suggestions from code review

023912b

Merge branch 'master' into feature/596_a2c

6167d04

Merge branch 'master' into feature/596_a2c

53ff8cc

Merge branch 'master' into feature/596_a2c

1159c63

Merge branch 'master' into feature/596_a2c

1faa5f5

Merge branch 'master' into feature/596_a2c

73b240f

Merge branch 'master' into feature/596_a2c

cdada9d

Merge branch 'master' into feature/596_a2c

89a3b1a

Merge branch 'master' into feature/596_a2c

c90beb9

Merge branch 'master' into feature/596_a2c

baa512a

fix test

eb30b22

Merge branch 'master' into feature/596_a2c

b37888d

Merge branch 'master' into feature/596_a2c

a509d04

Merge branch 'master' into feature/596_a2c

74bfa34

Merge branch 'master' into feature/596_a2c

57542aa

Merge branch 'master' into feature/596_a2c

d717a71

Update CHANGELOG.md

4687f9a

Borda merged commit bd28835 into Lightning-Universe:master Aug 13, 2021

Reinforcement Learning automation moved this from In progress to Done Aug 13, 2021

Borda mentioned this pull request Aug 13, 2021

Soft Actor Critic (SAC) Model #627

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advantage Actor Critic (A2C) Model #598

Advantage Actor Critic (A2C) Model #598

blahBlahhhJ commented Mar 19, 2021 •

edited

Loading

pep8speaks commented Mar 19, 2021 •

edited

Loading

codecov bot commented Mar 20, 2021 •

edited

Loading

blahBlahhhJ commented Mar 20, 2021 •

edited

Loading

akihironitta left a comment

akihironitta commented Mar 20, 2021

Borda Jun 24, 2021

blahBlahhhJ Jul 7, 2021

Borda Jul 20, 2021

Advantage Actor Critic (A2C) Model #598

Advantage Actor Critic (A2C) Model #598

Conversation

blahBlahhhJ commented Mar 19, 2021 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

pep8speaks commented Mar 19, 2021 • edited Loading

Comment last updated at 2021-08-13 12:58:07 UTC

codecov bot commented Mar 20, 2021 • edited Loading

Codecov Report

blahBlahhhJ commented Mar 20, 2021 • edited Loading

akihironitta left a comment

Choose a reason for hiding this comment

akihironitta commented Mar 20, 2021

Borda Jun 24, 2021

Choose a reason for hiding this comment

blahBlahhhJ Jul 7, 2021

Choose a reason for hiding this comment

Borda Jul 20, 2021

Choose a reason for hiding this comment

blahBlahhhJ commented Mar 19, 2021 •

edited

Loading

pep8speaks commented Mar 19, 2021 •

edited

Loading

codecov bot commented Mar 20, 2021 •

edited

Loading

blahBlahhhJ commented Mar 20, 2021 •

edited

Loading