Recurrent DQN: Training recurrent policies #2643

markstur · 2023-11-03T23:34:48Z

Fixes #2349

Description

Add tutorial from TorchRL

Checklist

The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
Only one issue is addressed in this pull request
Labels from the issue that this PR is fixing are added to this pull request
No unnecessary issues are included into this pull request.

cc @vmoens @nairbv @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen

Fixes: pytorch#2349 Signed-off-by: markstur <mark.sturdevant@ibm.com>

pytorch-bot · 2023-11-03T23:34:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2643

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit bba204e with merge base 789fc09 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

svekars · 2023-11-03T23:49:46Z

Make sure to fix the spellcheck. Some words can be added to the en-wordlist.txt to be skipped.

* Fix spellcheck issues * Add link to TorchRL * Plot was blank. Remove the unwanted check for each 50th. * Misc tweaks Signed-off-by: markstur <mark.sturdevant@ibm.com>

Signed-off-by: markstur <mark.sturdevant@ibm.com>

* Should be connected to Conclusion to be formatted properly Signed-off-by: markstur <mark.sturdevant@ibm.com>

vmoens

Thanks a lot for this!
I left some suggestions, mostly to get better looking links to the doc

vmoens · 2023-11-06T23:46:13Z

_static/img/rollout_recurrent.png

This figure should be updated, I just provided a new version.

vmoens · 2023-11-06T23:51:02Z

intermediate_source/dqn_with_rnn_tutorial.py

+# Conclusion
+# ----------
+#
+# We have seen how an RNN can be incorporated in a policy in torchrl.


Suggested change

# We have seen how an RNN can be incorporated in a policy in torchrl.

# We have seen how an RNN can be incorporated in a policy in TorchRL.

vmoens · 2023-11-06T23:51:36Z

en-wordlist.txt

@@ -207,6 +208,7 @@ TorchDynamo
 TorchInductor
 TorchMultimodal
 TorchRL
+torchrl


Suggested change

torchrl

since we removed the only occurence of torchrl in the comments.

vmoens · 2023-11-06T23:52:23Z

intermediate_source/dqn_with_rnn_tutorial.py

+# As this figure shows, our environment populates the TensorDict with zeroed recurrent
+# states which are read by the policy together with the observation to produce an
+# action, and recurrent states that will be used for the next step.
+# When the :func:`torchrl.envs.step_mdp` function is called, the recurrent states


Suggested change

# When the :func:`torchrl.envs.step_mdp` function is called, the recurrent states

# When the :func:`~torchrl.envs.utils.step_mdp` function is called, the recurrent states

vmoens · 2023-11-06T23:52:36Z

intermediate_source/dqn_with_rnn_tutorial.py

+# 84x84, scaling down the rewards and normalizing the observations.
+#
+# .. note::
+#   The :class:`torchrl.envs.StepCounter` transform is accessory. Since the CartPole


Suggested change

# The :class:`torchrl.envs.StepCounter` transform is accessory. Since the CartPole

# The :class:`~torchrl.envs.transforms.StepCounter` transform is accessory. Since the CartPole

vmoens · 2023-11-07T01:04:24Z

intermediate_source/dqn_with_rnn_tutorial.py

+#
+# .. code-block:: bash
+#
+#    !pip3 install torchrl-nightly


Suggested change

# !pip3 install torchrl-nightly

# !pip3 install torchrl

vmoens · 2023-11-07T01:11:07Z

intermediate_source/dqn_with_rnn_tutorial.py

+# ~~~~~~~~~~~
+#
+# TorchRL provides a specialized :class:`torchrl.modules.LSTMModule` class
+# to incorporate LSTMs in your code-base. It is a :class:`tensordict.nn.TensorDictModuleBase`


Suggested change

# to incorporate LSTMs in your code-base. It is a :class:`tensordict.nn.TensorDictModuleBase`

# to incorporate LSTMs in your code-base. It is a :class:`~tensordict.nn.TensorDictModuleBase`

vmoens · 2023-11-07T01:20:20Z

intermediate_source/dqn_with_rnn_tutorial.py

+    # it is important to pass data that is not flattened
+    rb.extend(data.unsqueeze(0).to_tensordict().cpu())
+    for _ in range(utd):
+        s = rb.sample().to(device)


Suggested change

s = rb.sample().to(device)

s = rb.sample().to(device, non_blocking=True)

vmoens · 2023-11-07T01:21:38Z

intermediate_source/dqn_with_rnn_tutorial.py

+# We have seen how an RNN can be incorporated in a policy in torchrl.
+# You should now be able:
+#
+# - Create an LSTM module that acts as a :class:`tensordict.nn.TensorDictModule`


Suggested change

# - Create an LSTM module that acts as a :class:`tensordict.nn.TensorDictModule`

# - Create an LSTM module that acts as a :class:`~tensordict.nn.TensorDictModule`

vmoens · 2023-11-07T01:23:58Z

requirements.txt

@@ -25,7 +25,7 @@ tensorboard
 jinja2==3.0.3
 pytorch-lightning
 torchx
-torchrl==0.2.0
+torchrl==0.2.1


Suggested change

torchrl==0.2.1

torchrl==0.2.0

we're sticking to 0.2.0 for now, or we need to upgrade both rl and tensordict to 0.2.1

@vmoens Does this work on MacOS with 0.2.1?

yes it should so to me we can update both dependencies

* mostly better looking links * torchrl and tensordict bump to 0.2.1 to support MacOS * updated image * updated Further Reading to go to TorchRL docs Signed-off-by: markstur <mark.sturdevant@ibm.com>

…o issue2349 Signed-off-by: markstur <mark.sturdevant@ibm.com>

markstur · 2023-11-08T01:13:40Z

Thanks @vmoens , I think I got all the fixes in. Note: I had to dev tools -> empty cache and hard reload in my browser to see a line in the chart in the preview of the tutorial.

vmoens

LGTM, it looks awesome! Thanks a mil!

Recurrent DQN: Training recurrent policies

404be0d

Fixes: pytorch#2349 Signed-off-by: markstur <mark.sturdevant@ibm.com>

facebook-github-bot added the cla signed label Nov 3, 2023

github-actions bot added rl Issues related to reinforcement learning tutorial, DQN, and so on medium docathon-h2-2023 and removed cla signed labels Nov 3, 2023

svekars requested a review from vmoens November 3, 2023 23:48

facebook-github-bot added the cla signed label Nov 4, 2023

Merge branch 'main' into issue2349

b16e7c8

github-actions bot removed the cla signed label Nov 4, 2023

facebook-github-bot added the cla signed label Nov 4, 2023

markstur added 2 commits November 4, 2023 21:13

Spellcheck fixes and other fixes

bd0490b

* Fix spellcheck issues * Add link to TorchRL * Plot was blank. Remove the unwanted check for each 50th. * Misc tweaks Signed-off-by: markstur <mark.sturdevant@ibm.com>

Merge remote-tracking branch 'origin/issue2349' into issue2349

e51a3d5

Signed-off-by: markstur <mark.sturdevant@ibm.com>

markstur marked this pull request as ready for review November 5, 2023 04:28

github-actions bot removed the cla signed label Nov 5, 2023

Using words for reading (and spellcheck)

2c86d72

Signed-off-by: markstur <mark.sturdevant@ibm.com>

facebook-github-bot added the cla signed label Nov 5, 2023

github-actions bot removed the cla signed label Nov 5, 2023

facebook-github-bot added the cla signed label Nov 5, 2023

Merge branch 'main' into issue2349

b852c57

github-actions bot removed the cla signed label Nov 5, 2023

facebook-github-bot added the cla signed label Nov 5, 2023

Fix further reading section

c549d04

* Should be connected to Conclusion to be formatted properly Signed-off-by: markstur <mark.sturdevant@ibm.com>

github-actions bot removed the cla signed label Nov 6, 2023

Merge branch 'main' into issue2349

899f4eb

facebook-github-bot added the cla signed label Nov 6, 2023

github-actions bot removed the cla signed label Nov 6, 2023

facebook-github-bot added the cla signed label Nov 6, 2023

vmoens reviewed Nov 7, 2023

View reviewed changes

Merge branch 'main' into issue2349

20e45ed

github-actions bot removed the cla signed label Nov 7, 2023

facebook-github-bot added the cla signed label Nov 7, 2023

markstur added 2 commits November 7, 2023 15:18

Updates per review

eec1640

* mostly better looking links * torchrl and tensordict bump to 0.2.1 to support MacOS * updated image * updated Further Reading to go to TorchRL docs Signed-off-by: markstur <mark.sturdevant@ibm.com>

Merge branch 'issue2349' of https://github.com/markstur/tutorials int…

939c19e

…o issue2349 Signed-off-by: markstur <mark.sturdevant@ibm.com>

github-actions bot removed the cla signed label Nov 7, 2023

markstur requested a review from vmoens November 8, 2023 01:13

facebook-github-bot added the cla signed label Nov 8, 2023

vmoens approved these changes Nov 8, 2023

View reviewed changes

svekars approved these changes Nov 8, 2023

View reviewed changes

Merge branch 'main' into issue2349

bba204e

svekars merged commit ab4e99a into pytorch:main Nov 8, 2023
18 checks passed

github-actions bot removed the cla signed label Nov 8, 2023

vmoens mentioned this pull request Nov 9, 2023

[Doc] Update pendulum and rnn tutos pytorch/rl#1691

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recurrent DQN: Training recurrent policies #2643

Recurrent DQN: Training recurrent policies #2643

markstur commented Nov 3, 2023 •

edited

Loading

pytorch-bot bot commented Nov 3, 2023 •

edited

Loading

svekars commented Nov 3, 2023

vmoens left a comment

vmoens Nov 6, 2023

vmoens Nov 6, 2023

vmoens Nov 6, 2023

vmoens Nov 6, 2023

vmoens Nov 6, 2023

vmoens Nov 7, 2023

vmoens Nov 7, 2023

vmoens Nov 7, 2023

vmoens Nov 7, 2023

vmoens Nov 7, 2023

markstur Nov 7, 2023 •

edited

Loading

vmoens Nov 7, 2023

markstur commented Nov 8, 2023

vmoens left a comment

	# We have seen how an RNN can be incorporated in a policy in torchrl.
	# We have seen how an RNN can be incorporated in a policy in TorchRL.

	# When the :func:`torchrl.envs.step_mdp` function is called, the recurrent states
	# When the :func:`~torchrl.envs.utils.step_mdp` function is called, the recurrent states

	# The :class:`torchrl.envs.StepCounter` transform is accessory. Since the CartPole
	# The :class:`~torchrl.envs.transforms.StepCounter` transform is accessory. Since the CartPole

	# to incorporate LSTMs in your code-base. It is a :class:`tensordict.nn.TensorDictModuleBase`
	# to incorporate LSTMs in your code-base. It is a :class:`~tensordict.nn.TensorDictModuleBase`

	s = rb.sample().to(device)
	s = rb.sample().to(device, non_blocking=True)

	# - Create an LSTM module that acts as a :class:`tensordict.nn.TensorDictModule`
	# - Create an LSTM module that acts as a :class:`~tensordict.nn.TensorDictModule`

Recurrent DQN: Training recurrent policies #2643

Recurrent DQN: Training recurrent policies #2643

Conversation

markstur commented Nov 3, 2023 • edited Loading

Description

Checklist

pytorch-bot bot commented Nov 3, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2643

✅ No Failures

svekars commented Nov 3, 2023

vmoens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markstur Nov 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markstur commented Nov 8, 2023

vmoens left a comment

Choose a reason for hiding this comment

markstur commented Nov 3, 2023 •

edited

Loading

pytorch-bot bot commented Nov 3, 2023 •

edited

Loading

markstur Nov 7, 2023 •

edited

Loading