Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early Stopping behavior #1751

Closed
marcopodda opened this issue May 7, 2020 · 10 comments · Fixed by #1863
Closed

Early Stopping behavior #1751

marcopodda opened this issue May 7, 2020 · 10 comments · Fixed by #1863
Assignees
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@marcopodda
Copy link

marcopodda commented May 7, 2020

Hi there,
thanks for the great library (I am using 0.7.5.). I am not following the bug report template as I'm not sure this is indeed a bug, or simply I cannot understand how early stopping is implemented. My code looks as follows:

    early_stop_callback = EarlyStopping(
        monitor='val_acc',
        min_delta=0.0,
        patience=80,
        verbose=True,
        mode=self.mode
    )

    trainer = Trainer(
        early_stop_callback=early_stop_callback,
        auto_select_gpus=True,
        max_epochs=200,
        terminate_on_nan=True,
        show_progress_bar=True,
        fast_dev_run=False,
        gpus=1
    )

As I understand it, the model should perform early stopping after AT LEAST 80 epochs have passed without improvement on the validation accuracy. However, in my case, early stopping happened at epoch 75. Is this how it should be?

As I said, I am not sure this is actually a bug or a choice (perhaps early stopping is implemented at the batch level?). If it is indeed a bug, I will work a reproducible example. Thank you!

@marcopodda marcopodda added bug Something isn't working help wanted Open to be worked on labels May 7, 2020
@github-actions
Copy link
Contributor

github-actions bot commented May 7, 2020

Hi! thanks for your contribution!, great first issue!

@devforfu
Copy link

devforfu commented May 7, 2020

I would expect that it should iterate for at least 80 epochs, too. So to me, it looks like a bug or some kind of unexpected behavior. Would be great to figure it out!

@marcopodda
Copy link
Author

Ok then, I'll work out some notebook to see if I can reproduce.

@marcopodda
Copy link
Author

Thanks @mateuszpieniak
Here is a working example. As you can see, it stops at epoch 41 even though patience is set to 80.
https://github.com/marcopodda/pl-es-example/blob/master/ES%20example.ipynb

@elkotito
Copy link
Contributor

elkotito commented May 7, 2020

It is definitely a bug. I discovered that EarlyStopping.on_epoch_end is executed twice within one epoch, meaning that patience=160 should solve your issue temporarily.

In the file training_loop.py:
First call:

            if self.fast_dev_run or should_check_val:
                self.run_evaluation(test_mode=self.testing)
                self.call_checkpoint_callback()
                self.call_early_stop_callback()

Second call:

                # TODO wrap this logic into the callback
                if self.enable_early_stop:
                    if (met_min_epochs and met_min_steps) or self.fast_dev_run:
                        should_stop = self.early_stop_callback.on_epoch_end(self, self.get_model())
                        # stop training
                        stop = should_stop and met_min_epochs
                        if stop:
                            self.run_training_teardown()
                            return

@Anjum48
Copy link

Anjum48 commented May 8, 2020

I upgraded to the bleeding edge version yesterday and can confirm that this started happening to me too. I didn't have an issue before I upgraded (I think I was on 0.7.3 before?)

@ricpruss
Copy link

Yep we ran into this as well. It is called once in trainer and once in the on epoch end callback.

@Borda
Copy link
Member

Borda commented May 11, 2020

@Anjum48 @ricpruss mind send a fix, PR?

@elkotito
Copy link
Contributor

@Borda Well, I would love to make my first PL's PR if that's okay? 😉

@Borda
Copy link
Member

Borda commented May 11, 2020

@mateuszpieniak sure go ahead! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants