Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on_before_zero_grad called before on_after_backward #6665

Closed
a1302z opened this issue Mar 24, 2021 · 2 comments
Closed

on_before_zero_grad called before on_after_backward #6665

a1302z opened this issue Mar 24, 2021 · 2 comments
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@a1302z
Copy link

a1302z commented Mar 24, 2021

🐛 Hook orders are different to what is documented

In the documentation it says the order of called methods in the train loop is like the following:

def train_loop():
    on_train_epoch_start()
    train_outs = []
    for train_batch in train_dataloader():
        on_train_batch_start()
        out = training_step(batch)
        train_outs.append(out)
        loss = out.loss
        backward()
        on_after_backward()
        optimizer_step()
        on_before_zero_grad()
        optimizer_zero_grad()
        on_train_batch_end(out)

It furthermore says in the description of on_after_backward:

Called in the training loop after loss.backward() and before optimizers do anything. This is the ideal place to inspect or log gradient information.

For before_zero_grad it says:

Called after optimizer.step() and before optimizer.zero_grad().

which both matches with the above defined training loop.

However, if I use this methods like in the code below on_before_zero_grad is always called before on_after_backward.

Reproduction

I've attached the code from the README.md from the github repo slightly modified by adding the above mentioned methods

Expected behavior

The methods are called in the order as described in the documentation.

Environment

environment.yml file:

name: hook-order
channels:
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - python>=3.7.0
  - pytorch>=1.8.0
  - torchvision>=0.9.0
  - cudatoolkit>=11.1
  - scipy
  - torchcsprng
  - pytest
  - mypy
  - black
  - scikit-learn
  - pytorch-lightning
  - matplotlib
  - rope
  - pip
  - pip:
    - testfixtures
    - segmentation-models-pytorch


  • PyTorch Version (e.g., 1.0): 1.8.0
  • OS (e.g., Linux): Ubuntu 20.04
  • How you installed PyTorch (conda, pip, source): conda env create -f environment.yml
  • Python version: 3.9.2

Additional context

I would just need a method that is called right after optimizer_step() so if there is any alternative please let me know.
Thanks in advance.

@a1302z a1302z added bug Something isn't working help wanted Open to be worked on labels Mar 24, 2021
@awaelchli
Copy link
Contributor

I believe these docs are outdated, thanks for reporting.
The order was recently changed: #6147
zero_grad comes before backward and that's, therefore I would say on_before_zero_grad called before on_after_backward is also correct.

I would just need a method that is called right after optimizer_step() so if there is any alternative please let me know.
Thanks in advance.

Maybe in LightningModule:

def optimizer_step(self, *args, **kwargs):
     super().optmizer_step(*args, **kwargs)
     # do something after

Could that work for you?

@a1302z
Copy link
Author

a1302z commented Mar 25, 2021

Thanks that did the trick

@a1302z a1302z closed this as completed Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

2 participants