Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed resuming training with DDP when average_best_models=true #1111

Merged
merged 1 commit into from
Jun 1, 2023

Conversation

shaydeci
Copy link
Collaborator

@shaydeci shaydeci commented Jun 1, 2023

  • In line 52 we save the state dict even if it was just read. Saving is just for the first time we create the state_dict of the snapshots, so the saving is moved inside the else statement when we check if we resume or not.

@shaydeci shaydeci marked this pull request as ready for review June 1, 2023 07:40
Copy link
Contributor

@Louis-Dupont Louis-Dupont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shaydeci shaydeci merged commit aed7e96 into master Jun 1, 2023
1 check passed
@shaydeci shaydeci deleted the bug/SG-911_fix_ddp_resume_with_average_modeling branch June 1, 2023 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants