Add tensor hooks and 10.0 gradient clipping #8598

UnglvKitDe · 2022-07-16T20:39:29Z

Improves the stability issues as described in #8578

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Improving training stability with gradient regularization and NaN mitigation.

📊 Key Changes

🧊 Added a hook to convert NaN values to zero during training, reinforcing model stability.
✂️ Introduced gradient clipping to prevent excessively large gradients, which can corrupt model weights.

🎯 Purpose & Impact

🛡️ The NaN conversion to zero avoids potential crashes or instability in training by handling undefined numerical values gracefully.
💪 Gradient clipping safeguards the training process by keeping gradients within a manageable range, promoting healthier and more stable weight updates.
⚙️ These enhancements are geared towards making the YOLOv5 training process more robust and reliable, beneficial for users looking to train models without encountering numerical issues.

for more information, see https://pre-commit.ci

UnglvKitDe · 2022-07-17T03:15:54Z

@glenn-jocher I have to change the part with the hook, that is not so right I noticed. Srry.

glenn-jocher · 2022-07-17T10:51:36Z

@glenn-jocher I have to change the part with the hook, that is not so right I noticed. Srry.

Is this PR ready to test or do you still need to make changes?

UnglvKitDe · 2022-07-17T13:21:01Z

@glenn-jocher Yes now, I removed the unnecessary line of code. It would not have changed, but it is not necessary. Thanks a lot!

UnglvKitDe · 2022-07-18T14:15:18Z

@glenn-jocher A few insights: I've been running ~50 trainings on my private dataset with this new branch the last few days. So far it looks like this version is much more stable & the rSeed plays a less important role. No unstable trainings as described above have occurred so far. Before sometimes 3 out of 9 trainings were unstable. I know that is just my dataset and not a general test :)

glenn-jocher · 2022-07-30T17:27:54Z

@UnglvKitDe I'd like to close this out but we need a few changes. If this improves default trainings with little cost/resources penalty then we should enable it by default. We want to avoid adding argparser arguments if at all possible, we already have too many. Can you profile training with this branch and with master to compare mAP and training time and CUDA memory utilization?

glenn-jocher · 2022-07-30T17:30:03Z

@UnglvKitDe also why do we have to unscale optimizer first? This seems like a costly operation, wouldn't it make more sense to scale the gradient clipping value by the scaler settings?

UnglvKitDe · 2022-07-30T19:28:33Z

@UnglvKitDe I'd like to close this out but we need a few changes. If this improves default trainings with little cost/resources penalty then we should enable it by default. We want to avoid adding argparser arguments if at all possible, we already have too many. Can you profile training with this branch and with master to compare mAP and training time and CUDA memory utilization?

@glenn-jocher At the moment I unfortunately don't have the resource to run a full coco training. My own dataset has only a few hundred images, coco ~120k. Srry! Can you test this? If I can help you in any other way, please let me know :)

UnglvKitDe · 2022-07-30T19:34:25Z

@UnglvKitDe also why do we have to unscale optimizer first? This seems like a costly operation, wouldn't it make more sense to scale the gradient clipping value by the scaler settings?

@glenn-jocher As described here, it is necessary to unscale them because otherwise you will get the wrong gradients. II did not notice any significant longer training on my data set. But (as said above) I have only a small dataset. Maybe it also improves the training and leads to better results (by a more stable training).

glenn-jocher · 2022-07-30T20:21:37Z

@UnglvKitDe thanks, I'll review the link and run some tests this weekend.

glenn-jocher · 2022-07-31T17:02:07Z

@UnglvKitDe it looks like the PR does two separate things, first replace NaN with 0.0 and then separately clip gradients. Do you know which change resulted into the improvements you saw? Are you able to determine which was more important?

glenn-jocher · 2022-08-01T01:54:51Z

@UnglvKitDe I'm testing in Colab now. I see same time but slightly worse mAP for PR. I wonder if the clipping is too tight. We have 10.0 now, maybe put 30.0 or 100.0?

!git clone https://github.com/UnglvKitDe/yolov5-1 -b fix/grad_inf_nan  # clone
%cd yolov5-1
%pip install -qr requirements.txt  # install
!python train.py --img 640 --batch 16 --epochs 30 --data coco128.yaml --weights yolov5s.pt --cache

%cd ..
!git clone https://github.com/ultralytics/yolov5
%cd yolov5
!python train.py --img 640 --batch 16 --epochs 30 --data coco128.yaml --weights yolov5s.pt --cache

glenn-jocher · 2022-08-01T02:00:09Z

@UnglvKitDe confirm raising the clip max to 100.0 solves the issue of lower mAP. You think this might be too high?

UnglvKitDe · 2022-08-01T08:01:51Z

@UnglvKitDe it looks like the PR does two separate things, first replace NaN with 0.0 and then separately clip gradients. Do you know which change resulted into the improvements you saw? Are you able to determine which was more important?
@glenn-jocher For me, both in combination have improved the ressults.

UnglvKitDe · 2022-08-01T08:11:54Z

@glenn-jocher About your changes: I don't know to what extent the parameters may be None. Actually, this makes no sense. But in a forum this was advised once, but it can also be an old & solved bug. I have seen values for max_value from 2.0 to 5.0. 10 was already chosen very high by me. I find 100, is already very very high...

UnglvKitDe · 2022-08-01T08:15:39Z

@glenn-jocher I think that depends on the context, which height is needed, so I wanted to keep it configurable. I agree with you though that we have too many argparser arguments. Maybe we put this in a config file (hyp*.yaml or model *.yaml ?).

glenn-jocher · 2022-08-01T10:09:19Z

@UnglvKitDe reduced gradient clipping to 10.0 after some experiments found not much difference with master.

PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐

UnglvKitDe · 2022-08-01T11:44:30Z

@glenn-jocher Thx :)

glenn-jocher · 2022-08-01T19:24:14Z

@UnglvKitDe I observed erratic training behavior (green line) with the nan_to_num hook in classifier branch (I added it there also), so I'm going to remove it from master.

UnglvKitDe · 2022-08-04T00:40:39Z

@glenn-jocher Mh interesting, I have never seen anything like this. You have taken my code except for the None check I see. I do not know if this is the reason. But actually none of the weights should be None. And if they are, an exception should be thrown...

* Add tensor hooks and gradient clipping ultralytics#8578 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove retain_grad(), because its not necessary * Update train.py * Simplify * Update train.py * Update train.py * Update train.py * Update train.py Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

glenn-jocher · 2023-11-15T16:33:53Z

@UnglvKitDe I agree, the None checks should not be necessary, and the issue should be investigated further to ensure all weights are properly initialized and exceptions are handled appropriately. Thank you for bringing this to my attention.

UnglvKitDe and others added 2 commits July 16, 2022 22:35

Add tensor hooks and gradient clipping ultralytics#8578

ae398f7

[pre-commit.ci] auto fixes from pre-commit.com hooks

f5a12fd

for more information, see https://pre-commit.ci

glenn-jocher linked an issue Jul 16, 2022 that may be closed by this pull request

NaNs and INFs in gradient values #8578

Closed

2 tasks

glenn-jocher mentioned this pull request Jul 17, 2022

Sudden performance decrease in training #5721

Closed

1 task

Merge branch 'master' into fix/grad_inf_nan

56e68b0

glenn-jocher assigned UnglvKitDe Jul 17, 2022

Remove retain_grad(), because its not necessary

00c415b

glenn-jocher added 2 commits July 23, 2022 18:23

Merge branch 'master' into fix/grad_inf_nan

9e90cbf

Merge branch 'master' into fix/grad_inf_nan

60aab93

glenn-jocher added the TODO label Jul 30, 2022

glenn-jocher added 2 commits July 30, 2022 22:21

Merge branch 'master' into fix/grad_inf_nan

423a379

Merge branch 'master' into fix/grad_inf_nan

eb7631a

glenn-jocher added 6 commits July 31, 2022 19:24

Update train.py

3c59944

Simplify

3d152e0

Update train.py

8ece1e1

Update train.py

fd2ead2

Merge branch 'master' into fix/grad_inf_nan

8d3b52e

Merge branch 'master' into fix/grad_inf_nan

ec3ced1

Update train.py

dbb6f97

glenn-jocher removed the TODO label Aug 1, 2022

glenn-jocher changed the title ~~Add tensor hooks and gradient clipping #8578~~ Add tensor hooks and 10.0 gradient clipping Aug 1, 2022

Update train.py

5cb5433

glenn-jocher merged commit 0669f1b into ultralytics:master Aug 1, 2022

glenn-jocher mentioned this pull request Aug 1, 2022

Remove hook torch.nan_to_num(x) #8826

Merged

Hojland mentioned this pull request Oct 17, 2022

feat/bump Go-Autonomous/yolov5#15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tensor hooks and 10.0 gradient clipping #8598

Add tensor hooks and 10.0 gradient clipping #8598

UnglvKitDe commented Jul 16, 2022 •

edited by UltralyticsAssistant

Loading

UnglvKitDe commented Jul 17, 2022

glenn-jocher commented Jul 17, 2022

UnglvKitDe commented Jul 17, 2022

UnglvKitDe commented Jul 18, 2022

glenn-jocher commented Jul 30, 2022

glenn-jocher commented Jul 30, 2022

UnglvKitDe commented Jul 30, 2022

UnglvKitDe commented Jul 30, 2022

glenn-jocher commented Jul 30, 2022

glenn-jocher commented Jul 31, 2022

glenn-jocher commented Aug 1, 2022 •

edited

Loading

glenn-jocher commented Aug 1, 2022

UnglvKitDe commented Aug 1, 2022

UnglvKitDe commented Aug 1, 2022

UnglvKitDe commented Aug 1, 2022

glenn-jocher commented Aug 1, 2022

UnglvKitDe commented Aug 1, 2022

glenn-jocher commented Aug 1, 2022

UnglvKitDe commented Aug 4, 2022 •

edited

Loading

glenn-jocher commented Nov 15, 2023

Add tensor hooks and 10.0 gradient clipping #8598

Add tensor hooks and 10.0 gradient clipping #8598

Conversation

UnglvKitDe commented Jul 16, 2022 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

UnglvKitDe commented Jul 17, 2022

glenn-jocher commented Jul 17, 2022

UnglvKitDe commented Jul 17, 2022

UnglvKitDe commented Jul 18, 2022

glenn-jocher commented Jul 30, 2022

glenn-jocher commented Jul 30, 2022

UnglvKitDe commented Jul 30, 2022

UnglvKitDe commented Jul 30, 2022

glenn-jocher commented Jul 30, 2022

glenn-jocher commented Jul 31, 2022

glenn-jocher commented Aug 1, 2022 • edited Loading

glenn-jocher commented Aug 1, 2022

UnglvKitDe commented Aug 1, 2022

UnglvKitDe commented Aug 1, 2022

UnglvKitDe commented Aug 1, 2022

glenn-jocher commented Aug 1, 2022

UnglvKitDe commented Aug 1, 2022

glenn-jocher commented Aug 1, 2022

UnglvKitDe commented Aug 4, 2022 • edited Loading

glenn-jocher commented Nov 15, 2023

UnglvKitDe commented Jul 16, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Aug 1, 2022 •

edited

Loading

UnglvKitDe commented Aug 4, 2022 •

edited

Loading