Valid accuracy #10

shengzhang90 · 2023-09-27T13:40:53Z

Hello, sorry to bother you.

I have encountered a strange issue, the validation accuracy decreased heavily (89.9% -> 50%) in the middle of training, then it continued to increase slowly, is it reasonable?

Thanks a lot.

saikat-roy · 2023-09-28T05:22:22Z

Hi @shengzhang90. This shouldn't be the case usually. Could you provide more information like the type of task you were training on or if the training finished with decent accuracy and what configuration of the model you were training?

shengzhang90 · 2023-09-28T07:10:52Z

Hi, @saikat-roy,
I am training on lung tubular structure segmentation, the training is accompanied with online validation
(1) for our customized setting, block_counts: '2,2,3,3,3,3,3,2,2', the training shows that mean loss 1.48421, learn_rate: 0.00277765, higher at the 47th epoch;
(2) for the default base setting, block_counts: '2,2,2,2,2,2,2,2,2', the training shows that mean loss 0.599779, learn_rate: 0.00277765;
(3) for our previous setting upon other backbone, the training shows that mean loss 0.55954, learn_rate: 0.00277765;

thanks a lot.

saikat-roy · 2023-10-01T07:04:35Z

@shengzhang90 I assume your mean loss is the main metric you are using to judge accuracy. Are you using your own training pipeline to train and are you using mixed precision? I can atleast guess about some things that might be happening:

It sometimes does happen in the native nnUNet training pipeline (with the MedNeXt backbone) that the accuracy decreases a little bit (not as much as yours I think) in the middle of training but usually recovers by the end of training - 1000 epochs, which is default for nnunet, is a long time for it to recover.
While training on some datasets like KiTS19, we had to reduce the initial learning rate as the loss became NaN in a number of cases (similar to this or this. This seemingly happens due to our usage of Mixed Precision using AMP. We do not have a solution for this at the moment other than to use a smaller learning rate on these datasets/situations and to avoid the problem.

Let me know if these suggestions help you with your problem.

shengzhang90 · 2023-10-01T09:52:39Z

Hi, @saikat-roy,

I did not use AMP, since it would lead to NaN easily;
I actually used my own training pipeline with the MedNeXt backbone.
Just like you said, it would recover by the end of training.

Now, I think my training was reasonable.

Thanks a lot for your responsive answer.

saikat-roy closed this as completed Oct 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Valid accuracy #10

Valid accuracy #10

shengzhang90 commented Sep 27, 2023 •

edited

Loading

saikat-roy commented Sep 28, 2023

shengzhang90 commented Sep 28, 2023

saikat-roy commented Oct 1, 2023

shengzhang90 commented Oct 1, 2023 •

edited

Loading

Valid accuracy #10

Valid accuracy #10

Comments

shengzhang90 commented Sep 27, 2023 • edited Loading

saikat-roy commented Sep 28, 2023

shengzhang90 commented Sep 28, 2023

saikat-roy commented Oct 1, 2023

shengzhang90 commented Oct 1, 2023 • edited Loading

shengzhang90 commented Sep 27, 2023 •

edited

Loading

shengzhang90 commented Oct 1, 2023 •

edited

Loading