Propose to save new model before deleting previous ones in ModelCheckpointing #1150
Labels
feature
Is an improvement or enhancement
help wanted
Open to be worked on
let's do it!
approved to implement
Milestone
🚀 Feature
In an edge case, the trainer deleted previous model and then was killed because of system error before successfully saving new model. Thus all the models were lost.
I understand specifying save_top_k > 1 helps, and saving before deleting leads to larger disk consumption. But it might be good to provide an option for this?
Motivation
Pitch
in the worst case, you have two but never none...
Alternatives
Additional context
The text was updated successfully, but these errors were encountered: