Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advantage Actor Critic (A2C) Model #598

Merged
merged 46 commits into from
Aug 13, 2021
Merged

Advantage Actor Critic (A2C) Model #598

merged 46 commits into from
Aug 13, 2021

Conversation

blahBlahhhJ
Copy link
Contributor

@blahBlahhhJ blahBlahhhJ commented Mar 19, 2021

What does this PR do?

Update for #596 (issue)
Implementation of A2C model for Reinforcement Learning
image

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests? [not needed for typos/docs]
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@pep8speaks
Copy link

pep8speaks commented Mar 19, 2021

Hello @blahBlahhhJ! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-08-13 12:58:07 UTC

@github-actions github-actions bot added the model label Mar 19, 2021
@codecov
Copy link

codecov bot commented Mar 20, 2021

Codecov Report

Merging #598 (4687f9a) into master (2d7ae88) will decrease coverage by 47.31%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #598       +/-   ##
===========================================
- Coverage   71.64%   24.32%   -47.32%     
===========================================
  Files         119      120        +1     
  Lines        7367     7486      +119     
===========================================
- Hits         5278     1821     -3457     
- Misses       2089     5665     +3576     
Flag Coverage Δ
cpu 24.32% <0.00%> (-47.32%) ⬇️
pytest 24.32% <0.00%> (-47.32%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pl_bolts/models/rl/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
pl_bolts/models/rl/advantage_actor_critic_model.py 0.00% <0.00%> (ø)
pl_bolts/models/rl/common/agents.py 0.00% <0.00%> (-100.00%) ⬇️
pl_bolts/models/rl/common/networks.py 0.00% <0.00%> (-91.60%) ⬇️
pl_bolts/models/rl/dueling_dqn_model.py 0.00% <0.00%> (-100.00%) ⬇️
...l_bolts/models/rl/vanilla_policy_gradient_model.py 0.00% <0.00%> (-95.91%) ⬇️
pl_bolts/models/rl/double_dqn_model.py 0.00% <0.00%> (-95.84%) ⬇️
pl_bolts/models/rl/reinforce_model.py 0.00% <0.00%> (-89.40%) ⬇️
... and 68 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d7ae88...4687f9a. Read the comment docs.

@blahBlahhhJ blahBlahhhJ marked this pull request as ready for review March 20, 2021 04:21
@blahBlahhhJ
Copy link
Contributor Author

blahBlahhhJ commented Mar 20, 2021

@akihironitta
Hi, I've finished the implementation for A2C, and tested the performance on OpenAI's CartPole environment (performance screenshot on the top). I think it's pretty good. However, I have a few questions regarding the PR.

  1. In the 4th checklist item, does documentation means documentation in the code? If so, I think I wrote some amount of docs in my classes and methods.
  2. In the 6th checklist item, regarding testing, I've wrote some tests for A2C, and all of them passed. But there are a few tests from other components (many from the data module section) that failed and I'm 100% sure I did not touch any of them. I assume the failure exist before I forked this repo. How do I deal with that?
  3. For the last checklist item, I'm not sure if I should update the CHANGELOG, so I left it incomplete.

Let me know what I should do next for this PR. Thanks!

Copy link
Contributor

@akihironitta akihironitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blahBlahhhJ Hi, thank you for your contribution! The implementation looks great! I added a few commits directly to your branch to add its doc and to fix some minor issues. Also, I left some comments below, so would you mind having a look at them?

docs/source/reinforce_learn.rst Outdated Show resolved Hide resolved
docs/source/reinforce_learn.rst Outdated Show resolved Hide resolved
pl_bolts/models/rl/advantage_actor_critic_model.py Outdated Show resolved Hide resolved
pl_bolts/models/rl/advantage_actor_critic_model.py Outdated Show resolved Hide resolved
pl_bolts/models/rl/advantage_actor_critic_model.py Outdated Show resolved Hide resolved
@akihironitta
Copy link
Contributor

@blahBlahhhJ Thanks for the update!

  1. The classes and methods were well documented! They were just not indexed in the docs, but I added them to the docs in 22f3b85. :)
  2. You're right, the failing tests are irrelevant to the change in this PR. We will take care of that.
  3. I updated the changelog for you.

@akihironitta akihironitta added this to In progress in Reinforcement Learning via automation Mar 20, 2021
docs/source/reinforce_learn.rst Outdated Show resolved Hide resolved
return batch[0][0][0].device.index if self.on_gpu else "cpu"

@staticmethod
def add_model_specific_args(arg_parser: ArgumentParser) -> ArgumentParser:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I looked a bit into it, and It seems like in order to add the arguments (add_model_specific_args()) in the lightning module, I'll have to write my own subclass of LightningCLI, which defeats the purpose to use it to simplify the code. Let me know if my understanding is wrong. Or is there an example of using it in other RL algorithms?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the LightningCLI takes and maps what ever you have in Module init...

@Borda Borda added the ready label Jun 24, 2021
@Borda Borda requested a review from akihironitta June 24, 2021 07:45
@Borda Borda merged commit bd28835 into Lightning-Universe:master Aug 13, 2021
Reinforcement Learning automation moved this from In progress to Done Aug 13, 2021
@Borda Borda mentioned this pull request Aug 13, 2021
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed model ready
Development

Successfully merging this pull request may close these issues.

None yet

5 participants