Early stopping callback #2151

adeboissiere · 2020-06-11T16:07:06Z

🐛 Bug

Early stopping does not have the desired effect when creating a custom callback. Even when creating a custom callback with the default values, the training will stop before the early stopping before the conditions are met.

To Reproduce

Create callback

early_stop_callback = EarlyStopping(
        monitor='val_loss',
        min_delta=0.00,
        patience=3,
        verbose=False,
        mode='min',
        strict=True
    )

Create trainer

trainer = Trainer.from_argparse_args(Namespace(**dict(train_config)), early_stop_callback=early_stop_callback)

Train

trainer.fit(model)

Here are the validation steps in the model:

    def validation_step(self, batch, batch_idx):
        batch, y = batch
        y_hat = self(batch)

        loss = F.cross_entropy(y_hat, y.long())
        labels_hat = torch.argmax(y_hat, dim=1)
        n_correct_pred = torch.sum(y == labels_hat).item()

        return {'val_loss': loss, "n_correct_pred": n_correct_pred, "n_pred": len(y)}

    def validation_epoch_end(self, outputs):
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        val_acc = sum([x['n_correct_pred'] for x in outputs]) / sum(x['n_pred'] for x in outputs)
        tensorboard_logs = {'val_loss': avg_loss, 'val_acc': val_acc}

        return {'val_loss': avg_loss, 'log': tensorboard_logs}

Expected behavior

In my case, training stops after 2 epochs, whether the validation loss increases or not. The callback behavior should be the same as the default. When I don't pass a custom callback, it works fine. I'm probably doing something wrong.

Environment

PyTorch Version : 1.4.0+cu100
OS: Ubuntu 18.04
How you installed PyTorch (conda, pip, source): pip
Python version: 3.6.9
CUDA/cuDNN version: 10.0.130/7.6.4
GPU models and configuration: GeForce GTX 860M

Thanks'!

The text was updated successfully, but these errors were encountered:

github-actions · 2020-06-11T16:07:58Z

Hi! thanks for your contribution!, great first issue!

Borda · 2020-06-11T16:43:34Z

@jeremyjordan mind have a look ^^

alekseynp · 2020-06-12T04:11:18Z

What version of pytorch-lightning are you using?
I see a recent commit
479ab49
Only available in master and 0.8.0rc1
https://github.com/PyTorchLightning/pytorch-lightning/tree/0.8.0rc1

alekseynp · 2020-06-12T04:17:23Z

For the record, I arrived at this issue because in version 0.7.6 I observe early stopping not behaving properly.

tpanum · 2020-06-12T11:27:27Z

I have been puzzled about the issue of a CustomCallback never stopping in max mode (despite it clearly should've, considering the tensorboard logs). I am sitting on 0.7.6. Thanks for pointing it out, @alekseynp. I'll see if an upgrade can fix it.

DavidRuhe · 2020-06-14T16:21:04Z

Hi. I'm not sure if I should create a separate issue, but there is a very confusing bug regarding early stopping (still in the current master branch). The documentation states

By default early stopping will be enabled if ‘val_loss’ is found in validation_epoch_end()’s return dict. Otherwise training will proceed with early stopping disabled.

However, this is not true due to the following bug.
In callback_config.py we see the following code.

def configure_early_stopping(self, early_stop_callback):
        if early_stop_callback is True or None:
            self.early_stop_callback = EarlyStopping(
                monitor='val_loss',
                patience=3,
                strict=True,
                verbose=True,
                mode='min'
            )
            self.enable_early_stop = True
        elif not early_stop_callback:
            self.early_stop_callback = None
            self.enable_early_stop = False
        else:
            self.early_stop_callback = early_stop_callback
            self.enable_early_stop = True

Unless I'm misunderstanding something, the the behaviour as the documentation says it should be
if early_stop_callback is True or early_stop_callback is None: and the default argument should be put to None:

early_stop_callback: Optional[Union[EarlyStopping, bool]] = None,

In any case, the 'or None' clause will never be True and therefore is redundant as of now.

David

adeboissiere · 2020-06-14T21:48:36Z

What version of pytorch-lightning are you using?
I see a recent commit
479ab49
Only available in master and 0.8.0rc1
https://github.com/PyTorchLightning/pytorch-lightning/tree/0.8.0rc1

Hi. I'm using version 0.7.6.

jeremyjordan · 2020-06-16T01:05:05Z

There are a number of issues with early stopping, I have a PR (#1504) out to fix them. I have added a new test to cover your case @adeboissiere.

brucemuller · 2020-06-23T11:40:55Z

I'm also having issues. It seems to be only using the default values regardless and also the default values do not cause any early stopping to occur.

williamFalcon · 2020-06-26T13:52:59Z

Currently being worked on in #1504

adeboissiere added the help wanted Open to be worked on label Jun 11, 2020

Borda added the bug Something isn't working label Jun 11, 2020

jeremyjordan mentioned this issue Jun 16, 2020

fixes for early stopping and checkpoint callbacks #1504

Merged

10 tasks

awaelchli mentioned this issue Jun 28, 2020

Continue Jeremy's early stopping PR #1504 #2391

Merged

9 tasks

williamFalcon closed this as completed in #2391 Jun 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early stopping callback #2151

Early stopping callback #2151

adeboissiere commented Jun 11, 2020

github-actions bot commented Jun 11, 2020

Borda commented Jun 11, 2020

alekseynp commented Jun 12, 2020

alekseynp commented Jun 12, 2020

tpanum commented Jun 12, 2020

DavidRuhe commented Jun 14, 2020 •

edited

Loading

adeboissiere commented Jun 14, 2020

jeremyjordan commented Jun 16, 2020 •

edited

Loading

brucemuller commented Jun 23, 2020

williamFalcon commented Jun 26, 2020

Early stopping callback #2151

Early stopping callback #2151

Comments

adeboissiere commented Jun 11, 2020

🐛 Bug

To Reproduce

Expected behavior

Environment

github-actions bot commented Jun 11, 2020

Borda commented Jun 11, 2020

alekseynp commented Jun 12, 2020

alekseynp commented Jun 12, 2020

tpanum commented Jun 12, 2020

DavidRuhe commented Jun 14, 2020 • edited Loading

adeboissiere commented Jun 14, 2020

jeremyjordan commented Jun 16, 2020 • edited Loading

brucemuller commented Jun 23, 2020

williamFalcon commented Jun 26, 2020

DavidRuhe commented Jun 14, 2020 •

edited

Loading

jeremyjordan commented Jun 16, 2020 •

edited

Loading