Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clip_gradient with clip_grad_value #5460

Closed
dhkim0225 opened this issue Jan 11, 2021 · 5 comments · Fixed by #6123
Closed

clip_gradient with clip_grad_value #5460

dhkim0225 opened this issue Jan 11, 2021 · 5 comments · Fixed by #6123
Labels
feature Is an improvement or enhancement help wanted Open to be worked on priority: 1 Medium priority task
Milestone

Comments

@dhkim0225
Copy link
Contributor

dhkim0225 commented Jan 11, 2021

🚀 Feature

Same issue with #4927 #5456

The current clip_gradient uses clip_grad_norm; can we add clip_grad_value?

https://github.com/PyTorchLightning/pytorch-lightning/blob/f2e99d617f05ec65fded81ccc6d0d59807c47573/pytorch_lightning/plugins/native_amp.py#L63-L65

============================================================
@tchaton

As far as I know, there is a difference between clip_grad_by_value and clip_grad_by_norm.
All of the implementations in PL only use clip_grad_by_norm.
clip_grad_by_value does not perform clipping with norm value but just performs clipping by value, so it is useful when learning model with noisy data.
Please let me know if you think I'm wrong.

pytorch clip by norm link
pytorch clip by value link

Sincerely,
Anthony Kim.

@dhkim0225 dhkim0225 added feature Is an improvement or enhancement help wanted Open to be worked on labels Jan 11, 2021
@dhkim0225
Copy link
Contributor Author

@priancho
Copy link

priancho commented Jan 11, 2021

Moved the following comment from #5456 since reopening issue was disabled :-)


Hi @tchaton,

There are two popular gradient clipping methods: one that limits the maximum gradient value of each model parameter and the other one that scales the gradient value based on the p-norm of a (sub-)set of model parameters.

PyTorch Lightning implements the second option which can be used with Trainer's gradient_clip_val parameter as you mentioned.
This clipping algorithm is useful when the norm of gradients is large, but not when only a small sub-set of model parameters have abnormal gradient values since the norm will still be reasonably small considering the number of all model parameters.
I think that @dhkim0225 wants an alternative gradient clipping method that uses torch.nn.utils.clip_grad_value_() instead of torch.nn.utils.clip_grad_norm_() for such a scenario.

By the way, I don't think that this functionality is something that can break BC.
What about to add gradient_clip_algorithm parameter to Trainer, which is by default 'norm' but can be set to 'value'?

@tchaton
Copy link
Contributor

tchaton commented Jan 11, 2021

Hey @priancho @dhkim0225 ,

Yes, I understand now. Sounds like a great idea ! Would @priancho or @dhkim0225 want to make a PR for such feature ?

Best,
T.C

@tchaton tchaton added the priority: 1 Medium priority task label Jan 11, 2021
@tchaton tchaton added this to the 1.2 milestone Jan 11, 2021
@dhkim0225
Copy link
Contributor Author

@tchaton Okay, I will.
@priancho If you want to develop this feature together, please feel free to reach out to me.

@dhkim0225
Copy link
Contributor Author

@tchaton Sorry for bothering you. This is the first time to contribute a huge open-source project.
I made a PR for this. I would like to ask for your advice on the next things to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement help wanted Open to be worked on priority: 1 Medium priority task
Projects
None yet
4 participants