-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor common #540
Refactor common #540
Conversation
Good that you started the refactoring =) |
Given that this is mostly related to (and motivated by) the transition to new a new backend, I think we should merge this with the TF2 branch (#542) whenever it is ready for merging. |
I would merge it with both ;) (I should have the time today to look at this one) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM ;) (documentation is missing but you know it)
I don't know if we should be backward compatible when merging that with master (as we did for the action noise for instance)
This compatibility is obviously not required for tf2 branch.
# Logging | ||
# ================================================================ | ||
|
||
def total_episode_reward_logger(rew_acc, rewards, masks, writer, steps): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for myself: we should fix this one, it is not working properly when n_envs > 0 for a while now...
Regarding this PR: I am starting to have conflicted feelings about moving things to Should we continue loyal to these guidelines (algorithms import common, common does not import algorithms), or should we figure out better refactoring while doing this? Some refactoring was already discussed on Schedulers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your problem, so what I would do:
- put as much as you can (and think) in common, this should be fairly general helper/class
- put the helpers that are really algorithm specific into a
utils.py
next to the algorithm (that's already the case for A2C and TRPO)
Alright, now everything should be done. The only remaining "oddball" is GAIL importing from TRPO, but since these two are too intimately linked I decided to leave them as it is (and these algos should go to Adam's repository, as I understood). @araffin |
Merged #644 and other PRs. Everything is ready for review from my end. |
docs/misc/changelog.rst
Outdated
|
||
- Algorithms no longer import from each other. | ||
- `common` no longer import from algorithms. | ||
- Moved shared code to new files `common/math_util.py`, `common/buffers.py` and `common/tf_layers.py`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are breaking changes, no?
Maybe it would be good to list which function were moved and where
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, good point. Somehow I completely missed that ^^. I will document this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated changelog with such a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM =)
Refactoring
common
to include most shared code between learning algorithms.Description
Move (almost) any shared code out from algorithm-specific directories into commons, e.g:
a2c/utils.py
defined most of the TF layers and utilities. Now incommon/tf_layers.py
ReplayBuffer
was defined under DQN but also used by DDPG. Moved tocommon/replay_buffer.py
.safe_mean
moved from under PPO2 tocommon/math_util.py
Also removed unused code
common/tf_util.py, calc_entropy_softmax
.PPO1/TRPO still share code between themselves (defined in PPO1). This should also be moved in commons, but at the same time they are the only ones using this very specific type of code (trajectory generation).
Things to-do and discuss:
common/schedulers.py
.Motivation and Context
closes #503
While there is a
common
directory for shared code between learning algorithms, bunch of shared code existed under learning algorithms that was imported elsewhere code (e.g. Half of the TF definitions were undera2c/utils.py
). As discussed in #503, it would make more sense to have learning algorithms import shared code fromcommon
, andcommon
should not depend on per-algorithm definitions.This also aims to keep all backend (Tensorflow) related code few files, now contained in
tf_util.py
andtf_layers.py
undercommon
. This should help a tiny bit when transitioning to other backends.Types of changes
Checklist: