Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port TorchRL tutorial for DDPG Loss to pytorch.org #2413

Merged
merged 27 commits into from
Jun 13, 2023

Conversation

nairbv
Copy link
Contributor

@nairbv nairbv commented Jun 2, 2023

Port existing tutorial from torchrl to pytorch.org.

Fixes #2351

Description

copy of https://pytorch.org/rl/tutorials/coding_ddpg.html
The tutorial uses half_cheetah so used that for the image in the index.

cc @vmoens

Checklist

  • The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
  • Only one issue is addressed in this pull request
  • Labels from the issue that this PR is fixing are added to this pull request
  • No unnecessary issues are included into this pull request.

@facebook-github-bot
Copy link
Contributor

Hi @nairbv!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@github-actions github-actions bot added rl Issues related to reinforcement learning tutorial, DQN, and so on docathon-h1-2023 A label for the docathon in H1 2023 medium labels Jun 2, 2023
@netlify
Copy link

netlify bot commented Jun 2, 2023

Deploy Preview for pytorch-tutorials-preview ready!

Name Link
🔨 Latest commit f4a5e4b
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/64888d7b2273f50008c7f833
😎 Deploy Preview https://deploy-preview-2413--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Copy link
Contributor

@svekars svekars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few editorial suggestions

intermediate_source/coding_ddpg.py Outdated Show resolved Hide resolved
intermediate_source/coding_ddpg.py Outdated Show resolved Hide resolved
intermediate_source/coding_ddpg.py Outdated Show resolved Hide resolved
intermediate_source/coding_ddpg.py Outdated Show resolved Hide resolved
intermediate_source/coding_ddpg.py Outdated Show resolved Hide resolved
# environment and reset it when required.
# Data collectors are designed to help developers have a tight control
# on the number of frames per batch of data, on the (a)sync nature of this
# collection and on the resources allocated to the data collection (e.g. GPU,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# collection and on the resources allocated to the data collection (e.g. GPU,
# collection and on the resources allocated to the data collection (for example, GPU,

# Data collectors are designed to help developers have a tight control
# on the number of frames per batch of data, on the (a)sync nature of this
# collection and on the resources allocated to the data collection (e.g. GPU,
# number of workers etc).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# number of workers etc).
# number of workers, and so on).

# - the policy,
# - the total number of frames before the collector is considered empty,
# - the maximum number of frames per trajectory (useful for non-terminating
# environments, like dm_control ones).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# environments, like dm_control ones).
# environments, like the ``dm_control`` ones).

#
# .. note::
# As already mentioned above, to get a more reasonable performance,
# use a greater value for ``total_frames`` e.g. 1M.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# use a greater value for ``total_frames`` e.g. 1M.
# use a greater value for ``total_frames`` for example, 1M.

# Conclusion
# ----------
#
# In this tutorial, we have learnt how to code a loss module in TorchRL given
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# In this tutorial, we have learnt how to code a loss module in TorchRL given
# In this tutorial, we have learned how to code a loss module in TorchRL given

Copy link
Contributor

@svekars svekars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few editorial suggestions

@svekars svekars requested a review from vmoens June 2, 2023 21:51
@svekars
Copy link
Contributor

svekars commented Jun 5, 2023

@nairbv the PR looks great overall! Can you please address the editorial suggestions, sign the CLA, and fix the spellcheck.

Port existing tutorial from torchrl to pytorch.org.

pytorch#2351
@svekars
Copy link
Contributor

svekars commented Jun 5, 2023

Please rebase to run against the latest GH workflow.

@nairbv
Copy link
Contributor Author

nairbv commented Jun 5, 2023

did a rebase and made some of the stylistic changes, still waiting internally to be added to CLA

@nairbv
Copy link
Contributor Author

nairbv commented Jun 5, 2023

the pyspelling spell checker seems to be having an issue with acronyms like DDPG, common terms like cuda, as well as variable names mentioned from within code comments. What's the recommended way of addressing these kinds of CI errors?

nairbv and others added 4 commits June 5, 2023 16:36
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
@svekars
Copy link
Contributor

svekars commented Jun 5, 2023

@nairbv Check the https://github.com/pytorch/tutorials/blob/main/en-wordlist.txt for the spelling of CUDA and some other words and for the new acronyms, you can propose an update to that file. All code references (name of functions, classes, API methods, etc) should be enclosed in double quotes.

@nairbv
Copy link
Contributor Author

nairbv commented Jun 6, 2023

you can propose an update to that file

hmm.... that works for terms like DDPG I guess but...

All code references (name of functions, classes, API methods, etc) should be enclosed in double quotes.

This seems unconventional for single-line code comments. E.g. env on line 502 and 860, putting ticks around these won't cause them to be rendered the way they would in the text of the tutorial, and isn't how small code comments are typically written within code. It also wouldn't make sense to add variable names from a particular code block into a dictionary file.

In cases like the env example, are double ticks really the right solution? Is there any other option?

@svekars
Copy link
Contributor

svekars commented Jun 6, 2023

It is a hack, yes. But Pyspelling can't differentiate between a one-line comment and a multiline comment.

Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this. See my comment about next steps

# loss component;
# - How to use (or not) a target network, and how to update its parameters;
# - How to create an optimizer associated with a loss module.
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas of next steps:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment to document potential future tutorial updates, or are you suggesting we should add as a "next steps" section in the content of the tutorial?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a "next steps" or something like that.
The point is that most losses have more features than this "toy example" and it could be weird to read this tutorial from top to bottom without mentioning some nice features such as customization of the tensordict keys or using the loss without tensordict at all.
For now we don't have a tutorial about these features, but I would mention that these things are possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, but not sure if that's exactly how you wanted it. Your link above for flexible keys just goes to a long list of issues, you might have meant to paste something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have been pytorch/rl#1175

intermediate_source/coding_ddpg.py Outdated Show resolved Hide resolved
Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this. See my comment about next steps

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@vmoens
Copy link
Contributor

vmoens commented Jun 9, 2023

Regarging https://github.com/pytorch/tutorials/actions/runs/5214703804/jobs/9414299217?pr=2413#step:8:14384
Unfortunately I can't reproduce the error locally, I will need to dig a bit deeper to make sense of it...

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fix our bug

Comment on lines 859 to 861
create_env_fn=[
parallel_env,
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest not to use a parallel env here but just the async collector + regular env

intermediate_source/coding_ddpg.py Show resolved Hide resolved
intermediate_source/coding_ddpg.py Show resolved Hide resolved
intermediate_source/coding_ddpg.py Outdated Show resolved Hide resolved
intermediate_source/coding_ddpg.py Outdated Show resolved Hide resolved
Co-authored-by: Vincent Moens <vincentmoens@gmail.com>
Comment on lines 583 to 593
def parallel_env_constructor(
env_per_collector,
def env_constructor(
transform_state_dict,
):
if env_per_collector == 1:

def make_t_env():
env = make_transformed_env(make_env())
env.transform[2].init_stats(3)
env.transform[2].loc.copy_(transform_state_dict["loc"])
env.transform[2].scale.copy_(transform_state_dict["scale"])
return env

env_creator = EnvCreator(make_t_env)
return env_creator

parallel_env = ParallelEnv(
num_workers=env_per_collector,
create_env_fn=EnvCreator(lambda: make_env()),
create_env_kwargs=None,
pin_memory=False,
)
env = make_transformed_env(parallel_env)
# we call `init_stats` for a limited number of steps, just to instantiate
# the lazy buffers.
env.transform[2].init_stats(3, cat_dim=1, reduce_dim=[0, 1])
env.transform[2].load_state_dict(transform_state_dict)
return env
def make_t_env():
env = make_transformed_env(make_env())
env.transform[2].init_stats(3)
env.transform[2].loc.copy_(transform_state_dict["loc"])
env.transform[2].scale.copy_(transform_state_dict["scale"])
return env
env_creator = EnvCreator(make_t_env)
return env_creator
Copy link
Contributor

@vmoens vmoens Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svekars we can revert this and use the old parallel_env_constructor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def parallel_env_constructor(
    env_per_collector,
    transform_state_dict,
):
    if env_per_collector == 1:

        def make_t_env():
            env = make_transformed_env(make_env())
            env.transform[2].init_stats(3)
            env.transform[2].loc.copy_(transform_state_dict["loc"])
            env.transform[2].scale.copy_(transform_state_dict["scale"])
            return env

        env_creator = EnvCreator(make_t_env)
        return env_creator

    parallel_env = ParallelEnv(
        num_workers=env_per_collector,
        create_env_fn=EnvCreator(lambda: make_env()),
        create_env_kwargs=None,
        pin_memory=False,
    )
    env = make_transformed_env(parallel_env)
    # we call `init_stats` for a limited number of steps, just to instantiate
    # the lazy buffers.
    env.transform[2].init_stats(3, cat_dim=1, reduce_dim=[0, 1])
    env.transform[2].load_state_dict(transform_state_dict)
    return env

Copy link
Contributor

@svekars svekars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. Need to make sure to add the image from https://github.com/pytorch/rl/blob/main/docs/source/_static/img/replaybuffer_traj.png. This also feels like a advanced tutorial rather than intermediate so might want to move to advanced_source. We can do in a follow up PR.

@svekars svekars merged commit b7c93f4 into pytorch:main Jun 13, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed docathon-h1-2023 A label for the docathon in H1 2023 medium rl Issues related to reinforcement learning tutorial, DQN, and so on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

💡 [REQUEST] - Port TorchRL "Coding a DDPG loss" from pytorch.org/rl to pytorch.org/tutorials
4 participants