Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way to fix bug with validation frequency #1601

Merged
merged 14 commits into from
Nov 8, 2023

Conversation

hakuryuu96
Copy link
Contributor

@hakuryuu96 hakuryuu96 commented Nov 1, 2023

https://github.com/Deci-AI/super-gradients/issues/1324

This PR addresses the issue above. The behavior of saving checkpoints changes when max_epochs is not divisible by val_frequency. In this case, ckpt_latest won't be checked whether it is the best. A corresponding warning was added.

Copy link
Contributor

@Louis-Dupont Louis-Dupont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just very small note in the test

tests/end_to_end_tests/trainer_test.py Outdated Show resolved Hide resolved
@hakuryuu96
Copy link
Contributor Author

hakuryuu96 commented Nov 2, 2023

After a discussion with @BloodAxe we decided:

  1. to validate the latest epoch
  2. to validate the epochs in save_ckpt_list even if they are not divisible by val_freq

Corresponding changes have been added in the last commit.

Copy link
Collaborator

@BloodAxe BloodAxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BloodAxe BloodAxe merged commit 270f7b2 into Deci-AI:master Nov 8, 2023
6 of 7 checks passed
Louis-Dupont pushed a commit that referenced this pull request Nov 9, 2023
* Way to fix bug with validation frequency

* Fixed test, the state of net was rewritten

* Added validating the latest epoch and epochs from save_ckpt_epoch_list

* Added one more testcase to check wether latest notdivisible epoch has valid in metrics

* Following the SRP recommendation...

* Which inference time exactly

* Fixed incorrect keyword in writing function

* Missing brackets around epoch+1 in valid run check function.

* Final fixes hopefully :)

* Fixed trainer to add scalars only in main process

---------

Co-authored-by: Eugene Khvedchenya <ekhvedchenya@gmail.com>
Louis-Dupont added a commit that referenced this pull request Nov 13, 2023
* first working version without test

* add

* add tests

* move to _get_pipeline

* fix

* fix test

* Way to fix bug with validation frequency (#1601)

* Way to fix bug with validation frequency

* Fixed test, the state of net was rewritten

* Added validating the latest epoch and epochs from save_ckpt_epoch_list

* Added one more testcase to check wether latest notdivisible epoch has valid in metrics

* Following the SRP recommendation...

* Which inference time exactly

* Fixed incorrect keyword in writing function

* Missing brackets around epoch+1 in valid run check function.

* Final fixes hopefully :)

* Fixed trainer to add scalars only in main process

---------

Co-authored-by: Eugene Khvedchenya <ekhvedchenya@gmail.com>

* add images and update autopadding responsability

* add example of visualization w/o resizing

* add docstring

* remove unwanted prints

* add explicit auto_paddign

---------

Co-authored-by: hakuryuu96 <marchenkophilip@gmail.com>
Co-authored-by: Eugene Khvedchenya <ekhvedchenya@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants