on_before_zero_grad called before on_after_backward #6665

a1302z · 2021-03-24T16:05:08Z

🐛 Hook orders are different to what is documented

In the documentation it says the order of called methods in the train loop is like the following:

def train_loop():
    on_train_epoch_start()
    train_outs = []
    for train_batch in train_dataloader():
        on_train_batch_start()
        out = training_step(batch)
        train_outs.append(out)
        loss = out.loss
        backward()
        on_after_backward()
        optimizer_step()
        on_before_zero_grad()
        optimizer_zero_grad()
        on_train_batch_end(out)

It furthermore says in the description of on_after_backward:

Called in the training loop after loss.backward() and before optimizers do anything. This is the ideal place to inspect or log gradient information.

For before_zero_grad it says:

Called after optimizer.step() and before optimizer.zero_grad().

which both matches with the above defined training loop.

However, if I use this methods like in the code below on_before_zero_grad is always called before on_after_backward.

Reproduction

I've attached the code from the README.md from the github repo slightly modified by adding the above mentioned methods

Expected behavior

The methods are called in the order as described in the documentation.

Environment

environment.yml file:

name: hook-order
channels:
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - python>=3.7.0
  - pytorch>=1.8.0
  - torchvision>=0.9.0
  - cudatoolkit>=11.1
  - scipy
  - torchcsprng
  - pytest
  - mypy
  - black
  - scikit-learn
  - pytorch-lightning
  - matplotlib
  - rope
  - pip
  - pip:
    - testfixtures
    - segmentation-models-pytorch

PyTorch Version (e.g., 1.0): 1.8.0
OS (e.g., Linux): Ubuntu 20.04
How you installed PyTorch (conda, pip, source): conda env create -f environment.yml
Python version: 3.9.2

Additional context

I would just need a method that is called right after optimizer_step() so if there is any alternative please let me know.
Thanks in advance.

The text was updated successfully, but these errors were encountered:

awaelchli · 2021-03-25T00:17:29Z

I believe these docs are outdated, thanks for reporting.
The order was recently changed: #6147
zero_grad comes before backward and that's, therefore I would say on_before_zero_grad called before on_after_backward is also correct.

I would just need a method that is called right after optimizer_step() so if there is any alternative please let me know.
Thanks in advance.

Maybe in LightningModule:

def optimizer_step(self, *args, **kwargs):
     super().optmizer_step(*args, **kwargs)
     # do something after

Could that work for you?

a1302z · 2021-03-25T07:54:50Z

Thanks that did the trick

a1302z added bug Something isn't working help wanted Open to be worked on labels Mar 24, 2021

a1302z closed this as completed Mar 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

on_before_zero_grad called before on_after_backward #6665

on_before_zero_grad called before on_after_backward #6665

a1302z commented Mar 24, 2021

awaelchli commented Mar 25, 2021

a1302z commented Mar 25, 2021

on_before_zero_grad called before on_after_backward #6665

on_before_zero_grad called before on_after_backward #6665

Comments

a1302z commented Mar 24, 2021

🐛 Hook orders are different to what is documented

Reproduction

Expected behavior

Environment

Additional context

awaelchli commented Mar 25, 2021

a1302z commented Mar 25, 2021