fix: fix qadam NAN problem #654

wangraying · 2022-12-05T15:16:23Z

wangraying · 2022-12-05T15:35:45Z

This pr is ready, please take a look.

ganshaoduo · 2023-01-03T08:05:21Z

I think maybe it's better to keep the 'weight_decay' and set it to 0 by default, for consistency with the original Adam optimizer.

wangraying · 2023-01-03T10:51:08Z

I think maybe it's better to keep the 'weight_decay' and set it to 0 by default, for consistency with the original Adam optimizer.

Thanks @ganshaoduo , wondering how do we use weight_decay when calculating momentum, the relative code is here. The original paper does not discuss the situation with weight_decay, so I am not sure whether this will still converge in theory. In fact, I did some simple tests, seems the performance is not as good (I did not tune the parameters much).

ganshaoduo · 2023-01-17T14:17:37Z

Well spotted! Actually, according to the current implementation, we only apply the 'weight_decay' in the warm-up phase but not in the compression phase. This change of grad is only counted if step_id < self.warmup_steps. However, when we calculate the momentum here, the 'weight_decay' has been ignored.

About the necessity of "weight_decay", I would say it does not have a direct impact on the convergence. All it does is to add a small value to the gradient, which won't hurt the theoretical analysis of Qadam in my view.

Therefore, the question here is whether we add the 'weight_decay' operation here such that we use the same weight decay during both warm-up and compression phases, OR we delete the weight_decay operation here such that we do not consider the weight decay at all. I would prefer the former.

wangraying · 2023-01-18T12:45:32Z

Thanks for your detailed explanations, Shaoduo. Yes, I added weight_decay using the first way you mentioned, but the performance drops on the synthetic benchmark. Since this pr is for fixing NAN problem for QAdam, how about we removing weight_decay temporarily here but adding an issue about this. We could add it back when it is totally prepared after more validations.

wangraying · 2023-01-18T12:53:13Z

The current implementation of weight_decay is added by me last time during refactoring. Our version #0 does not have this weight_decay. And we have not used it in our benchmarks and published results. Or maybe keeping current implementation unchanged is also a good idea?

Any opinions from other reviewers?

wangraying · 2023-02-02T13:37:37Z

After discussions offline, we decided to rollback the modifications for weight_decay.

pr-triage bot added the PR: unreviewed label Dec 5, 2022

wangraying changed the title ~~fix: fix qadam nan problem~~ fix: fix qadam NAN problem Dec 5, 2022

wangraying requested review from woqidaideshi, ganshaoduo and NOBLES5E December 5, 2022 15:20

wangraying force-pushed the fix-qadam-nan branch from 0572355 to 35e6ddc Compare December 5, 2022 15:34

fix: fix qadam NAN problem

088d0ca

wangraying force-pushed the fix-qadam-nan branch from 343516e to 088d0ca Compare December 5, 2022 15:45

wangraying added 2 commits December 5, 2022 23:58

fix QAdam error message

930d771

remove weight decay temporarily

fd44462

wangraying requested a review from BinhangYuan January 18, 2023 14:54

rollback modifications for weight_decay

f11002b

woqidaideshi merged commit 12562b0 into master Feb 4, 2023

pr-triage bot added PR: merged and removed PR: unreviewed labels Feb 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix qadam NAN problem #654

fix: fix qadam NAN problem #654

wangraying commented Dec 5, 2022 •

edited

Loading

wangraying commented Dec 5, 2022

ganshaoduo commented Jan 3, 2023

wangraying commented Jan 3, 2023 •

edited

Loading

ganshaoduo commented Jan 17, 2023

wangraying commented Jan 18, 2023

wangraying commented Jan 18, 2023 •

edited

Loading

wangraying commented Feb 2, 2023

fix: fix qadam NAN problem #654

fix: fix qadam NAN problem #654

Conversation

wangraying commented Dec 5, 2022 • edited Loading

wangraying commented Dec 5, 2022

ganshaoduo commented Jan 3, 2023

wangraying commented Jan 3, 2023 • edited Loading

ganshaoduo commented Jan 17, 2023

wangraying commented Jan 18, 2023

wangraying commented Jan 18, 2023 • edited Loading

wangraying commented Feb 2, 2023

wangraying commented Dec 5, 2022 •

edited

Loading

wangraying commented Jan 3, 2023 •

edited

Loading

wangraying commented Jan 18, 2023 •

edited

Loading