Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valid accuracy #10

Closed
shengzhang90 opened this issue Sep 27, 2023 · 4 comments
Closed

Valid accuracy #10

shengzhang90 opened this issue Sep 27, 2023 · 4 comments

Comments

@shengzhang90
Copy link

shengzhang90 commented Sep 27, 2023

Hello, sorry to bother you.

I have encountered a strange issue, the validation accuracy decreased heavily (89.9% -> 50%) in the middle of training, then it continued to increase slowly, is it reasonable?

Thanks a lot.

@saikat-roy
Copy link
Member

Hi @shengzhang90. This shouldn't be the case usually. Could you provide more information like the type of task you were training on or if the training finished with decent accuracy and what configuration of the model you were training?

@shengzhang90
Copy link
Author

Hi, @saikat-roy,
I am training on lung tubular structure segmentation, the training is accompanied with online validation
(1) for our customized setting, block_counts: '2,2,3,3,3,3,3,2,2', the training shows that mean loss 1.48421, learn_rate: 0.00277765, higher at the 47th epoch;
(2) for the default base setting, block_counts: '2,2,2,2,2,2,2,2,2', the training shows that mean loss 0.599779, learn_rate: 0.00277765;
(3) for our previous setting upon other backbone, the training shows that mean loss 0.55954, learn_rate: 0.00277765;

thanks a lot.

@saikat-roy
Copy link
Member

@shengzhang90 I assume your mean loss is the main metric you are using to judge accuracy. Are you using your own training pipeline to train and are you using mixed precision? I can atleast guess about some things that might be happening:

  1. It sometimes does happen in the native nnUNet training pipeline (with the MedNeXt backbone) that the accuracy decreases a little bit (not as much as yours I think) in the middle of training but usually recovers by the end of training - 1000 epochs, which is default for nnunet, is a long time for it to recover.
  2. While training on some datasets like KiTS19, we had to reduce the initial learning rate as the loss became NaN in a number of cases (similar to this or this. This seemingly happens due to our usage of Mixed Precision using AMP. We do not have a solution for this at the moment other than to use a smaller learning rate on these datasets/situations and to avoid the problem.

Let me know if these suggestions help you with your problem.

@shengzhang90
Copy link
Author

shengzhang90 commented Oct 1, 2023

Hi, @saikat-roy,

  1. I did not use AMP, since it would lead to NaN easily;
  2. I actually used my own training pipeline with the MedNeXt backbone.
  3. Just like you said, it would recover by the end of training.

Now, I think my training was reasonable.

Thanks a lot for your responsive answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants