Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor common #540

Merged
merged 31 commits into from
Feb 28, 2020
Merged

Refactor common #540

merged 31 commits into from
Feb 28, 2020

Conversation

Miffyli
Copy link
Collaborator

@Miffyli Miffyli commented Nov 4, 2019

Refactoring common to include most shared code between learning algorithms.

Description

Move (almost) any shared code out from algorithm-specific directories into commons, e.g:

  • a2c/utils.py defined most of the TF layers and utilities. Now in common/tf_layers.py
  • ReplayBuffer was defined under DQN but also used by DDPG. Moved to common/replay_buffer.py .
  • Some math utils like safe_mean moved from under PPO2 to common/math_util.py

Also removed unused code common/tf_util.py, calc_entropy_softmax.

PPO1/TRPO still share code between themselves (defined in PPO1). This should also be moved in commons, but at the same time they are the only ones using this very specific type of code (trajectory generation).

Things to-do and discuss:

  • Unifying some of the code. E.g. there are now two different types of parameter schedulers in common/schedulers.py.

Motivation and Context

closes #503

While there is a common directory for shared code between learning algorithms, bunch of shared code existed under learning algorithms that was imported elsewhere code (e.g. Half of the TF definitions were under a2c/utils.py). As discussed in #503, it would make more sense to have learning algorithms import shared code from common, and common should not depend on per-algorithm definitions.

This also aims to keep all backend (Tensorflow) related code few files, now contained in tf_util.py and tf_layers.py under common. This should help a tiny bit when transitioning to other backends.

Types of changes

  • Refactoring (no changes to documentation)

Checklist:

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • I have updated the documentation accordingly.

@araffin
Copy link
Collaborator

araffin commented Nov 4, 2019

Good that you started the refactoring =)
I will try to give you some feedback this week and open at the same time a PR for tf2 to have more visibility on the progress

@Miffyli
Copy link
Collaborator Author

Miffyli commented Nov 9, 2019

Given that this is mostly related to (and motivated by) the transition to new a new backend, I think we should merge this with the TF2 branch (#542) whenever it is ready for merging.

@araffin
Copy link
Collaborator

araffin commented Nov 9, 2019

I would merge it with both ;) (I should have the time today to look at this one)

Copy link
Collaborator

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM ;) (documentation is missing but you know it)
I don't know if we should be backward compatible when merging that with master (as we did for the action noise for instance)
This compatibility is obviously not required for tf2 branch.

stable_baselines/a2c/utils.py Outdated Show resolved Hide resolved
stable_baselines/common/schedules.py Show resolved Hide resolved
# Logging
# ================================================================

def total_episode_reward_logger(rew_acc, rewards, masks, writer, steps):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for myself: we should fix this one, it is not working properly when n_envs > 0 for a while now...

@Miffyli
Copy link
Collaborator Author

Miffyli commented Nov 20, 2019

Regarding this PR: I am starting to have conflicted feelings about moving things to common. One one hand it makes intuitive sense to have all shared code between and algorithms should not depend on each other, but on the other hand some of the current implementations are very algorithm specific (e.g. only used by PPO1 and TRPO). Doing this PR would end up with code in common which in reality is tightly tied to those specific algorithms.

Should we continue loyal to these guidelines (algorithms import common, common does not import algorithms), or should we figure out better refactoring while doing this? Some refactoring was already discussed on Schedulers.

Copy link
Collaborator

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your problem, so what I would do:

  1. put as much as you can (and think) in common, this should be fairly general helper/class
  2. put the helpers that are really algorithm specific into a utils.py next to the algorithm (that's already the case for A2C and TRPO)

stable_baselines/deepq/__init__.py Outdated Show resolved Hide resolved
@araffin araffin added this to the v3.0.0 milestone Nov 20, 2019
@araffin araffin added the v3 Discussion about V3 label Nov 23, 2019
@araffin araffin mentioned this pull request Feb 5, 2020
@Miffyli Miffyli marked this pull request as ready for review February 16, 2020 11:38
@Miffyli
Copy link
Collaborator Author

Miffyli commented Feb 16, 2020

Alright, now everything should be done. The only remaining "oddball" is GAIL importing from TRPO, but since these two are too intimately linked I decided to leave them as it is (and these algos should go to Adam's repository, as I understood).

@araffin
If good for merging, we can first merge e.g. #644 and I can sort out the merge conflicts and whatnot before merging this one :)

@Miffyli Miffyli requested a review from araffin February 16, 2020 11:40
@Miffyli
Copy link
Collaborator Author

Miffyli commented Feb 27, 2020

Merged #644 and other PRs. Everything is ready for review from my end.


- Algorithms no longer import from each other.
- `common` no longer import from algorithms.
- Moved shared code to new files `common/math_util.py`, `common/buffers.py` and `common/tf_layers.py`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are breaking changes, no?

Maybe it would be good to list which function were moved and where

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, good point. Somehow I completely missed that ^^. I will document this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated changelog with such a list.

Copy link
Collaborator

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM =)

@araffin araffin merged commit a4efff0 into master Feb 28, 2020
@araffin araffin deleted the refactor/common branch February 28, 2020 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v3 Discussion about V3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] RL Common / Baselines Common Package
2 participants