-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Resume training #3458
Comments
Hi guys. I faced this exact issue too. Is there a solution in the end? |
Hi @alfredwallace7 For the documentation: |
Hi again. To my own surprise, I managed to do it @alfredwallace7
Then you can create the trainer object as usual
|
Thanks for you replies. I'll fully read the doc and try the hack! |
I would add that resuming is important if you're training models on AWS and want to use spot instances, they need to be able to be interrupted and continue from a checkpoint automatically. |
Question
I'm trying to resume training according to :
This code
where it says :
7. continue training at later point. Load previously trained model checkpoint, then resume
trained_model = SequenceTagger.load(path + '/checkpoint.pt')
resume training best model, but this time until epoch 25
trainer.resume(trained_model,
base_path=path + '-resume',
max_epochs=25,
)
but resume is not defined in :
class ModelTrainer(Pluggable)
I'm sure it's a common task using your awesome library yet I cannot get it working.
Any information would be very appreciated.
The text was updated successfully, but these errors were encountered: