Change optimizer parameters group method #1239

hoonyyhoon · 2020-10-29T09:35:13Z

What does this PR do?

This changes the current way of dividing model parameters into 3 optimizer parameter groups.
Since it divides parameters using names, there is a potential bug.
By changing names with hasattr, isinstance method, it can prevent potential mis-grouping due to naming and give a more generalized method for grouping.

Test

Simple test code to check code runs as same by appending module name into lists and comparing.
Tested on yolov5x.yaml, yolov5l.yaml, yolov5m.yaml, yolov5s.yaml

pg0, pg1, pg2 = [], [], []  # optimizer parameter groups
pg0_old, pg1_old, pg2_old = [], [], []

# Changes
for k, v in model.named_modules():
	if hasattr(v, 'bias') and isinstance(v.bias, torch.Tensor):
	    pg2.append(k+'.bias')
	if isinstance(v, nn.BatchNorm2d):
	    pg0.append(k+'.weight')
	elif hasattr(v, 'weight') and isinstance(v.weight, torch.Tensor):
	    pg1.append(k+'.weight')

# Current
for k, v in model.named_parameters():
	v.requires_grad = True
	if '.bias' in k:
	    pg2_old.append(k)  # biases
	elif '.weight' in k and '.bn' not in k:
	    pg1_old.append(k)  # apply weight decay
	else:
	    pg0_old.append(k)

# Test
assert pg0 == pg0_old
assert pg1 == pg1_old
assert pg2 == pg2_old

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Refinement to layer freezing and parameter grouping in YOLOv5 training script. 🛠️

📊 Key Changes

Removed the previously unused freeze list containing parameter names to freeze.
Enabled training (requires_grad=True) for all layers by default.
Simplified the logic for freezing layers based on the freeze list.
Adjusted the optimizer parameter grouping to differentiate between biases, batch normalization weights, and other weights more accurately.

🎯 Purpose & Impact

The changes ensure all layers are trainable by default, offering flexibility to freeze specific layers if needed. 🗃️
The update provides clearer distinction among parameter groups, which may aid in the optimization process and allow more precise tuning of the training routine. ⚙️
Users may experience a more intuitive setup when customizing which parts of their model to train, potentially leading to improved model performance. 🎯

github-actions

Hello @hoonyyhoon, thank you for submitting a PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

Verify your PR is up-to-date with origin/master. If your PR is behind origin/master update by running the following, replacing 'feature' with the name of your local branch:

git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature  # <----- replace 'feature' with local branch name
git rebase upstream/master
git push -u origin -f

Verify all Continuous Integration (CI) checks are passing.
Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

glenn-jocher · 2020-10-30T15:31:15Z

@hoonyyhoon very interesting! Thanks for the PR.

glenn-jocher · 2020-10-30T15:50:24Z

@hoonyyhoon should torch.Tensor be replaced with nn.Parameter?

This might prevent non-gradient tensors from joining the param groups. There are examples of this in the Detect() layer, which holds the anchors as registered buffers, but not as parameters (they are not modified by loss).

hoonyyhoon · 2020-10-31T06:25:44Z

@glenn-jocher
Here is simple print out for named_moudles.

for k, v in model.named_modules():
    print(f"{k} // {v}")

...
...
model.23.m.0 // Bottleneck(
  (cv1): Conv(
    (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (bn): BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
    (act): Hardswish()
  )
  (cv2): Conv(
    (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (bn): BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
    (act): Hardswish()
  )
)
model.23.m.0.cv1 // Conv(
  (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
  (bn): BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
  (act): Hardswish()
)
model.23.m.0.cv1.conv // Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
model.23.m.0.cv1.bn // BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.m.0.cv1.act // Hardswish()
model.23.m.0.cv2 // Conv(
  (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn): BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
  (act): Hardswish()
)
model.23.m.0.cv2.conv // Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
model.23.m.0.cv2.bn // BatchNorm2d(256, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
model.23.m.0.cv2.act // Hardswish()

## Detect module

model.24 // Detect(
  (m): ModuleList(
    (0): Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
    (1): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
    (2): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
  )
)
model.24.m // ModuleList(
  (0): Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
  (1): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
  (2): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
)
model.24.m.0 // Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
model.24.m.1 // Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
model.24.m.2 // Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))

As you already know, it iterates module recursively.
So hasattr checks whether the module contains its parameter(ex: bias, weight) or not. And the reason why I inserted second condition(isinstance), is because some module holds variable bias or weight as None(ex: nn.Conv2d with bias=False).
Unless we name register_buffer as weights or bias, it wouldn't be a problem, and I thought this be a really rare case.

But, what you suggested seems safer to me as well.

So TL;DR, what you suggested seems better to me.
Thank you for reviewing!

hoonyyhoon · 2020-10-31T06:33:00Z

Tested with assertion as follows.

pg0, pg1, pg2 = [], [], []  # optimizer parameter groups
for k, v in model.named_modules():
    if hasattr(v, 'bias'):
        assert isinstance(v.bias, nn.Parameter)==isinstance(v.bias, torch.Tensor)
    if isinstance(v, nn.BatchNorm2d):
        pg0.append(v.weight)
    elif hasattr(v, 'weight'):
        assert isinstance(v.weight, nn.Parameter)==isinstance(v.weight, torch.Tensor)
        pg1.append(v.weight)    # apply weight decay

glenn-jocher · 2020-11-01T22:34:53Z

Adding code to resolve TODO #679 to this PR. We'll kill two birds with one stone.

glenn-jocher · 2020-11-01T22:36:19Z

/rebase

glenn-jocher · 2020-11-01T22:52:11Z

Changes look good, waiting on CI checks to merge.

glenn-jocher · 2020-11-01T23:08:01Z

Bug fixed, checks passing, merging this PR now.

@hoonyyhoon thank you for your contributions! I think this update will provide for more robust parameter group settings going forward, and the code is more widely applicable now to future use cases. Nice job!

* Change optimizer parameters group method * Add torch nn * Change isinstance method(torch.Tensor to nn.Parameter) * parameter freeze fix, PEP8 reformat * freeze bug fix Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

github-actions bot reviewed Oct 29, 2020

View reviewed changes

hoonyyhoon force-pushed the change_optimizer_param_group_method branch from 17e2707 to 777b1e1 Compare October 29, 2020 09:45

glenn-jocher self-requested a review October 30, 2020 15:31

glenn-jocher linked an issue Nov 1, 2020 that may be closed by this pull request

Transfer Learning - Freezing Parameters #679

Closed

glenn-jocher assigned hoonyyhoon and glenn-jocher Nov 1, 2020

glenn-jocher added the enhancement New feature or request label Nov 1, 2020

hoonyyhoon added 3 commits November 1, 2020 22:36

Change optimizer parameters group method

769e923

Add torch nn

54be5de

Change isinstance method(torch.Tensor to nn.Parameter)

e0000f7

github-actions bot force-pushed the change_optimizer_param_group_method branch from b66ca24 to e0000f7 Compare November 1, 2020 22:36

parameter freeze fix, PEP8 reformat

eab3c54

freeze bug fix

4178722

glenn-jocher merged commit 187f7c2 into ultralytics:master Nov 1, 2020

glenn-jocher mentioned this pull request Nov 2, 2020

Transfer Learning - Freezing Parameters #679

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change optimizer parameters group method #1239

Change optimizer parameters group method #1239

hoonyyhoon commented Oct 29, 2020 •

edited by UltralyticsAssistant

Loading

github-actions bot left a comment

glenn-jocher commented Oct 30, 2020

glenn-jocher commented Oct 30, 2020

hoonyyhoon commented Oct 31, 2020 •

edited

Loading

hoonyyhoon commented Oct 31, 2020

glenn-jocher commented Nov 1, 2020

glenn-jocher commented Nov 1, 2020

glenn-jocher commented Nov 1, 2020

glenn-jocher commented Nov 1, 2020

Change optimizer parameters group method #1239

Change optimizer parameters group method #1239

Conversation

hoonyyhoon commented Oct 29, 2020 • edited by UltralyticsAssistant Loading

What does this PR do?

Test

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

github-actions bot left a comment

Choose a reason for hiding this comment

glenn-jocher commented Oct 30, 2020

glenn-jocher commented Oct 30, 2020

hoonyyhoon commented Oct 31, 2020 • edited Loading

hoonyyhoon commented Oct 31, 2020

glenn-jocher commented Nov 1, 2020

glenn-jocher commented Nov 1, 2020

glenn-jocher commented Nov 1, 2020

glenn-jocher commented Nov 1, 2020

hoonyyhoon commented Oct 29, 2020 •

edited by UltralyticsAssistant

Loading

hoonyyhoon commented Oct 31, 2020 •

edited

Loading