Confusion over Early Stopping behavior and how it is intended to work in the future #2083

Dunrar · 2020-06-05T11:04:22Z

❓ Questions and Help

What is your question?

So, I intend to use Early Stopping on train_step and training metrics. There were some problems with this (early stopping being called twice in the training loop, not stopping at all when using 'min' mode, not stopping when having no validation, a missing return in the callback class). Those were fixed quickly, but I have some problems with current master still and in #1458 early stopping on training metrics seems to have been disabled, if I understand it correctly. This is also in the 0.8.0-dev documentation. But changing where it is being called is possible.

My question is, will Early Stopping on training metrics be possible going forward? Will calling an Early Stopping subclass in on_train_end catch training metrics and stop training depending on them?

Also, I don't know if I should create another bug report for my current problem with early stopping before #1504 has been merged, which might fix it. I have not changed my code to using a subclass of EarlyStopping, but edited the EarlyStopping class to return self._run_early_stopping_check(trainer, pl_module) in def on_validation_end(self, trainer, pl_module): (which is going to be in #1504 anyway if I understand correctly). Early stopping seems to work now (despite not having a val_step...) but it stops too early again, not before patience has been reached, but clearly before it should.

Code

early_stopping = EarlyStopping(
            monitor='batch/mean_absolute_loss',
            min_delta=hparams.min_delta,
            patience=hparams.patience,
            mode='min'
)

with hparams.patience=150 and hparams.min_delta=0.01, but this happens (epoch/mean_absolute_loss is the mean of all batch/mean_absolute_loss of an epoch, logged in on_epoch_end):

Way too early (provided I understand the expected behavior right), is it not?

The text was updated successfully, but these errors were encountered:

williamFalcon · 2020-06-05T11:21:11Z

yes it will. this is currently being worked on in #1989.

Once this lands you'll add:

# training_step OR validation_step
return TrainResult(loss, early_stop_on=something_else, checkpoint_on=something_else)

Dunrar · 2020-06-05T11:33:29Z

Thank you! I like it alot!

About current behavior on master, I seem to be able to stop training early on training metrics despite #1458, so that functionality is still there right now, correct? Any idea why my training stops this early?

Dunrar added the question Further information is requested label Jun 5, 2020

Dunrar closed this as completed Jun 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion over Early Stopping behavior and how it is intended to work in the future #2083

Confusion over Early Stopping behavior and how it is intended to work in the future #2083

Dunrar commented Jun 5, 2020 •

edited

Loading

williamFalcon commented Jun 5, 2020 •

edited

Loading

Dunrar commented Jun 5, 2020

Confusion over Early Stopping behavior and how it is intended to work in the future #2083

Confusion over Early Stopping behavior and how it is intended to work in the future #2083

Comments

Dunrar commented Jun 5, 2020 • edited Loading

❓ Questions and Help

What is your question?

Code

williamFalcon commented Jun 5, 2020 • edited Loading

Dunrar commented Jun 5, 2020

Dunrar commented Jun 5, 2020 •

edited

Loading

williamFalcon commented Jun 5, 2020 •

edited

Loading