give "validation sanity check" flag for "validation_epoch_end" & "validation_step" #1391

davinnovation · 2020-04-06T08:32:09Z

🚀 Feature

Motivation

When using some custom saver, logger in validation function (validation_epoch_end, validation_step), with Trainer.fit(), it always execute validation sanity check so mess log comes out.

Pitch

def validation_step(self, batch, batch_nb, sanity_check):
   if sanity_check:
      ...
def validation_epoch_end(self, outputs, sanity_check):
   if sanity_check:
      ...

or

def validation_step(self, batch, batch_nb):
   if self.sanity_check:
      ...
def validation_epoch_end(self, outputs):
   if self.sanity_check:
      ...

Alternatives

None

Additional context

None

The text was updated successfully, but these errors were encountered:

awaelchli · 2020-05-04T00:42:30Z

This could be addressed with the Trainer states: #1633

stale · 2020-07-03T01:31:07Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ZhaofengWu · 2020-12-23T04:24:32Z

@awaelchli I'm looking at #1633 and the merged PR #2541. What's the training state that corresponds to this usage? Is it TrainerState.INITIALIZING? Though it looks like fit starts with TrainerState.RUNNING, while the sanity check happens within fit. Does this mean there's still no way to do this right now?

ZhaofengWu · 2020-12-23T04:36:10Z

Nvm, found this flag trainer.running_sanity_check.

noamzilo · 2021-01-18T12:24:11Z

Is there a way to disable .log and .logger for the sanity step using the framework in an elegant way?

awaelchli · 2021-01-18T12:35:44Z

@noamzilo how about we provide logger.enable() and logger.disable() methods?
Then the user could call these for example in the LightningModule, to temporarily disable logging.
For example:

def on sanity_check_start(self):
    self.logger.disable()

def on_sanity_check_end(self):
    self.logger.enable()

noamzilo · 2021-01-18T13:31:12Z

@noamzilo how about we provide logger.enable() and logger.disable() methods?
Then the user could call these for example in the LightningModule, to temporarily disable logging.
For example:
def on sanity_check_start(self):
    self.logger.disable()

def on_sanity_check_end(self):
    self.logger.enable() 

sounds great :)
what about .log?

In this opportunity, I would like to raise a concern I found, not sure if I am doing something wrong:

These are plotted using

tb.add_scalars("losses", {"val_loss": loss}, global_step=self.current_epoch)
tb.add_scalars("losses", {"train_loss": loss}, global_step=self.current_epoch)

(tb being self.logger.experiment)

with

trainer = Trainer(
            overfit_batches=True,
            logger=TensorBoardLogger(...)
    )

and a single sample per batch.

This configuration should mean that the training loss should always equal the validation loss.

However, as the graphs show (and I validated this with the debugger), the training loss lags by 1 epoch after the val loss, which appears as if the train loss is LARGER than the val loss, which raised my attention.
They are exactly equal, but with a lag of 1.

using tb.add_scalars("losses", {"train_loss": loss}, global_step=self.current_epoch - 1) solves this, but I doubt that was the original designers intention.

Am I doing something wrong? Did I find a bug?

Initially, I thought this was due to the sanity epoch, but it seems this isn't the case.

This happens on any data set I tried.

AtomScott · 2021-02-08T11:11:40Z

@noamzilo @awaelchli has this been implemented?

def on sanity_check_start(self):
    self.logger.disable()

def on_sanity_check_end(self):
    self.logger.enable()

If not we shouldn't reopen this issue? This is a feature that I would use daily and also something were I had to write my own workaround so I don't mind getting my own hands dirty.

jmerkow · 2021-09-14T23:24:11Z

@noamzilo @awaelchli Bumping this issue....

@ZhaofengWu how do you access this flag from a PL module?

ananthsub · 2021-09-14T23:32:03Z

you can use if self.trainer.sanity_checking inside the LightningModule

brendanartley · 2023-07-11T18:39:13Z

It seems that the trainer.sanity_checking variable is not accessible in a callback?

class CustomCallback(pl.Callback):
  def __init__(self):
    super().__init__()

  def on_validation_epoch_end(self, trainer, module):
    if not trainer.sanity_checking:
        return
    else:
        # do something here

This throws the following error.

AttributeError: 'Trainer' object has no attribute 'running_sanity_check'

klieret · 2023-07-24T20:39:42Z

Are you sure you didn't mistype trainer.running_sanity_check instead of trainer.sanity_checking? I can access the variable just as in your code snippet.

davinnovation added feature Is an improvement or enhancement help wanted Open to be worked on labels Apr 6, 2020

stale bot added the won't fix This will not be worked on label Jul 3, 2020

stale bot closed this as completed Jul 12, 2020

nzw0301 mentioned this issue Oct 12, 2021

Do not call trial.report during sanity check optuna/optuna#3002

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

give "validation sanity check" flag for "validation_epoch_end" & "validation_step" #1391

give "validation sanity check" flag for "validation_epoch_end" & "validation_step" #1391

davinnovation commented Apr 6, 2020 •

edited

Loading

awaelchli commented May 4, 2020

stale bot commented Jul 3, 2020

ZhaofengWu commented Dec 23, 2020 •

edited

Loading

ZhaofengWu commented Dec 23, 2020

noamzilo commented Jan 18, 2021

awaelchli commented Jan 18, 2021

noamzilo commented Jan 18, 2021 •

edited

Loading

AtomScott commented Feb 8, 2021 •

edited

Loading

jmerkow commented Sep 14, 2021

ananthsub commented Sep 14, 2021 •

edited

Loading

brendanartley commented Jul 11, 2023 •

edited

Loading

klieret commented Jul 24, 2023

give "validation sanity check" flag for "validation_epoch_end" & "validation_step" #1391

give "validation sanity check" flag for "validation_epoch_end" & "validation_step" #1391

Comments

davinnovation commented Apr 6, 2020 • edited Loading

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

awaelchli commented May 4, 2020

stale bot commented Jul 3, 2020

ZhaofengWu commented Dec 23, 2020 • edited Loading

ZhaofengWu commented Dec 23, 2020

noamzilo commented Jan 18, 2021

awaelchli commented Jan 18, 2021

noamzilo commented Jan 18, 2021 • edited Loading

AtomScott commented Feb 8, 2021 • edited Loading

jmerkow commented Sep 14, 2021

ananthsub commented Sep 14, 2021 • edited Loading

brendanartley commented Jul 11, 2023 • edited Loading

klieret commented Jul 24, 2023

davinnovation commented Apr 6, 2020 •

edited

Loading

ZhaofengWu commented Dec 23, 2020 •

edited

Loading

noamzilo commented Jan 18, 2021 •

edited

Loading

AtomScott commented Feb 8, 2021 •

edited

Loading

ananthsub commented Sep 14, 2021 •

edited

Loading

brendanartley commented Jul 11, 2023 •

edited

Loading