Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pruning detection models #4477

Closed
maxin-cn opened this issue Jan 18, 2022 · 13 comments
Closed

pruning detection models #4477

maxin-cn opened this issue Jan 18, 2022 · 13 comments
Assignees
Labels
model compression question Further information is requested

Comments

@maxin-cn
Copy link

Is there any example for pruning detection models?

@J-shang
Copy link
Contributor

J-shang commented Jan 19, 2022

Hello @XinMa-AI , we don't have an example for the detection model right now. Which detection model do you want to apply for pruning? or you can contribute one example to NNI if it's convenient for you.

@J-shang J-shang self-assigned this Jan 19, 2022
@maxin-cn
Copy link
Author

Hello @XinMa-AI , we don't have an example for the detection model right now. Which detection model do you want to apply for pruning? or you can contribute one example to NNI if it's convenient for you.

Hi, J-shang, I would like to prune the 'fasterrcnn_resnet50_fpn' model from torchvision database. I just want to demonstrate whether NNI can prune some detection models. So, could you please tell me whether NNI supports to prune detection models? Thanks~

@J-shang
Copy link
Contributor

J-shang commented Jan 20, 2022

Hello @XinMa-AI , we don't have an example for the detection model right now. Which detection model do you want to apply for pruning? or you can contribute one example to NNI if it's convenient for you.

Hi, J-shang, I would like to prune the 'fasterrcnn_resnet50_fpn' model from torchvision database. I just want to demonstrate whether NNI can prune some detection models. So, could you please tell me whether NNI supports to prune detection models? Thanks~

In fact, NNI pruning the module weight, I find that fasterrcnn_resnet50_fpn has Conv2d and Linear, so if you want to prune these layers, NNI can prune.

@maxin-cn
Copy link
Author

Hello @XinMa-AI , we don't have an example for the detection model right now. Which detection model do you want to apply for pruning? or you can contribute one example to NNI if it's convenient for you.

Hi, J-shang, I would like to prune the 'fasterrcnn_resnet50_fpn' model from torchvision database. I just want to demonstrate whether NNI can prune some detection models. So, could you please tell me whether NNI supports to prune detection models? Thanks~

In fact, NNI pruning the module weight, I find that fasterrcnn_resnet50_fpn has Conv2d and Linear, so if you want to prune these layers, NNI can prune.

Hi, We are currently using NNI to prune YOLOv5 model, but we have encountered such an error. How can we solve this problem?

This is the code:

import torch, torchvision
from nni.algorithms.compression.v2.pytorch.pruning import L1NormPruner, L2NormPruner
from nni.compression.pytorch.speedup import ModelSpeedup
from rich import print
from utils.general import check_img_size
from models.common import Conv
from models.experimental import attempt_load
from models.yolo import Detect
from utils.activations import SiLU
import torch.nn as nn

device = device = torch.device("cuda" if torch.cuda.is_available() else "CPU")
model = attempt_load('yolov5m.pt', map_location=device, inplace=True, fuse=False) # load FP32 model
model.eval()

for k, m in model.named_modules():
if isinstance(m, Conv): # assign export-friendly activations
if isinstance(m.act, nn.SiLU):
m.act = SiLU()
elif isinstance(m, Detect):
m.inplace = False
m.onnx_dynamic = False
if hasattr(m, 'forward_export'):
m.forward = m.forward_export # assign custom forward (optional)

imgsz = (640, 640)
imgsz *= 2 if len(imgsz) == 1 else 1 # expand

gs = int(max(model.stride)) # grid size (max stride)
imgsz = [check_img_size(x, gs) for x in imgsz] # verify img_size are gs-multiples
im = torch.zeros(1, 3, *imgsz).to(device) # image size(1,3,320,192) BCHW iDetection

for _ in range(2):
y = model(I'm)

print(im.shape)

cfg_list = [{
'sparsity': 0.5, 'op_types': ['Conv2d']
}]

print('cfg_list', cfg_list)
pruner = L1NormPruner(model, cfg_list)
_, masks = pruner.compress()
pruner.show_pruned_weights()
pruner._unwrap_model()

ModelSpeedup(model, dummy_input=im, masks_file=masks).speedup_model()

This is the problem we suffered from:
[2022-01-24 16:00:52] INFO (FixMaskConflict/MainThread) dim0 sparsity: 0.499922
[2022-01-24 16:00:52] INFO (FixMaskConflict/MainThread) dim1 sparsity: 0.000000
[2022-01-24 16:00:52] INFO (FixMaskConflict/MainThread) Dectected conv prune dim" 0
[2022-01-24 16:00:53] INFO (nni.compression.pytorch.speedup.compressor/MainThread) infer module masks...
model.0.conv /yolov5-master/nni/compression/pytorch/speedup/compressor.py Line 355
[2022-01-24 16:00:53] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for model.0.conv
model.24.aten::select.282 yolov5-master/nni/compression/pytorch/speedup/compressor.py Line 355
[2022-01-24 16:00:55] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for model.24.aten::select.282


In select module!!!
Traceback (most recent call last):
File "4.py", line 65, in
ModelSpeedup(model, dummy_input=im, masks_file=masks).speedup_model()
File "yolov5-master/nni/compression/pytorch/speedup/compressor.py", line 517, in speedup_model
self.infer_modules_masks()
File "yolov5-master/nni/compression/pytorch/speedup/compressor.py", line 362, in infer_modules_masks
self.update_direct_sparsity(curnode)
File "yolov5-master/nni/compression/pytorch/speedup/compressor.py", line 218, in update_direct_sparsity
func, dummy_input, in_masks, in_constants=in_constants, batch_dim=self.batch_dim)
File "yolov5-master/nni/compression/pytorch/speedup/infer_mask.py", line 80, in init
self.output = self.module(*dummy_input)
File "anaconda3/envs/torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'x'

@J-shang
Copy link
Contributor

J-shang commented Jan 24, 2022

Hello @XinMa-AI ,

  1. this is because torch.jit.trace only supports the above types.
    The solution is to modify the forward function output in all modules in fasterrcnn_resnet50_fpn to the above types (tensors, lists, tuples of tensors, or dictionary of tensors). But I found another bug to speed up this model, we will try to fix it as soon as possible.

  2. This is because the dummy input doesn't reach the branch to use this module, could you find what's this module is? If you can find it, you can exclude this layer in config_list https://nni.readthedocs.io/en/stable/Compression/Tutorial.html?highlight=exclude#specify-the-configuration

We have tried NNI compression on yolov3, maybe you can try v3 at first, and we will try to fix these bugs you propose.

@syswyl
Copy link

syswyl commented Jan 24, 2022

Hello @XinMa-AI ,

  1. this is because torch.jit.trace only supports the above types.
    The solution is to modify the forward function output in all modules in fasterrcnn_resnet50_fpn to the above types (tensors, lists, tuples of tensors, or dictionary of tensors). But I found another bug to speed up this model, we will try to fix it as soon as possible.
  2. This is because the dummy input doesn't reach the branch to use this module, could you find what's this module is? If you can find it, you can exclude this layer in config_list https://nni.readthedocs.io/en/stable/Compression/Tutorial.html?highlight=exclude#specify-the-configuration

We have tried NNI compression on yolov3, maybe you can try v3 at first, and we will try to fix these bugs you propose.

Thanks for your reply to my colleague, for the latest version of the yolov5 model, I added some Judgment condition to nni/compression/pytorch/speedup/compressor.py

        while not visit_queue.empty():
            curnode = visit_queue.get()
            if 'model.24.aten::select' in curnode.name:
                continue
            # forward mask inference for curnode
            self.update_direct_sparsity(curnode)
            successors = self.torch_graph.find_successors(curnode.unique_name)
            for successor in successors:
                in_degree[successor] -= 1
                if in_degree[successor] == 0:
                    visit_queue.put(self.torch_graph.name_to_node[successor])

to skip that branch. However, in the subsequent speedup process, new problems appeared
I checked and found that when processing the module, the node outputs contains three values, which does not match this processing framework. I wonder whether this is the reason for the original yolov5 or the reason for nni to process it ?
截屏2022-01-24 下午6 07 50

Traceback (most recent call last):
File "v9_yolo.py", line 84, in
ModelSpeedup(model, dummy_input=dummy_input.to(device), masks_file=masks).speedup_model()
File "nni-master25/nni/compression/pytorch/speedup/compressor.py", line 526, in speedup_model
self.infer_modules_masks()
File "nni-master25/nni/compression/pytorch/speedup/compressor.py", line 372, in infer_modules_masks
self.update_direct_sparsity(curnode)
File "nni-master25/nni/compression/pytorch/speedup/compressor.py", line 247, in update_direct_sparsity
node.outputs) == 1, 'The number of the output should be one after the Tuple unpacked manually'
AssertionError: The number of the output should be one after the Tuple unpacked manually

@zhiqwang
Copy link
Contributor

zhiqwang commented Jan 24, 2022

Hi @syswyl ,

AssertionError: The number of the output should be one after the Tuple unpacked manually

I didn't dig into this problem, but I guess it is due to the SPPF introduced in YOLOv5 v6.0, check this ticket zhiqwang/yolort#234 (comment) for more information.

@maxin-cn
Copy link
Author

Hello @XinMa-AI ,

  1. this is because torch.jit.trace only supports the above types.
    The solution is to modify the forward function output in all modules in fasterrcnn_resnet50_fpn to the above types (tensors, lists, tuples of tensors, or dictionary of tensors). But I found another bug to speed up this model, we will try to fix it as soon as possible.
  2. This is because the dummy input doesn't reach the branch to use this module, could you find what's this module is? If you can find it, you can exclude this layer in config_list https://nni.readthedocs.io/en/stable/Compression/Tutorial.html?highlight=exclude#specify-the-configuration

We have tried NNI compression on yolov3, maybe you can try v3 at first, and we will try to fix these bugs you propose.

Hi, @J-shang, we use the following codes to check the predecessors and successors of one node, we found the predecessors of tree nodes in model.24 of YOLOv5 were none (please see the following pictures).

        self.torch_graph.unpack_manually()
        in_degree = {}
        out_degree = {}
        visit_queue = queue.Queue()
        for node in self.torch_graph.nodes_py.nodes_op:
            print('node.unique_name', node.unique_name)
            successors = self.torch_graph.find_successors(node.unique_name)
            print('successors', successors)
            out_degree[node.unique_name] = len(successors)
            predecessors = self.torch_graph.find_predecessors(node.unique_name)
            print('predecessors', predecessors)
            in_degree[node.unique_name] = len(predecessors)
            if in_degree[node.unique_name] == 0:
                visit_queue.put(node)

        exit()

node.unique_name model.24.aten::select.209
successors ['model.24.aten::mul.210']
predecessors []

node.unique_name model.24.aten::select.229
successors ['model.24.aten::mul.230']
predecessors []

node.unique_name model.24.aten::select.249
successors ['model.24.aten::mul.250']
predecessors []

WechatIMG21743

We think the predecessors of the `select` operation may be the model.24.m.2 or the `shape` operation after `Conv`. We are wondering whether those none predecessors cause this problem? And we also directly skip `model.24` by using `cfg_list`, but we found it doesn't work.

@sqz-07040120
Copy link

Hello @J-shang @XinMa-AI .

I am also pruning the yolov5 model now, and there are several problems.

Q1:When I do ModelSpeedup(model, dummy_input=im, masks_file=masks).speedup_model(),the dummy_input() won't match the 'input' of the corresponding layer (to be exact , the silce_block in this picture) in my network, so I changed the *duumy_input to dummy_input[0] to solve this problem.It hasn't gone wrong yet.

Q2:when I do ModelSpeedup(model, dummy_input=im, masks_file=masks).speedup_model(),it will stop with problem

[2022-01-25 18:58:29] INFO (nni.compression.pytorch.speedup.compressor/MainThread) infer module masks...
[2022-01-25 18:58:29] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for backbone.focus.slice_block
[2022-01-25 18:58:29] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for backbone.focus.CSM1.c1
[2022-01-25 18:58:30] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for backbone.focus.CSM1.aten::sigmoid.70
[2022-01-25 18:58:30] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for backbone.focus.CSM1.aten::mul.71
Traceback (most recent call last):
  File "C:\Users\39826\Desktop\caoooooooooooo\nni_prune.py", line 95, in <module>
    m_speedup.speedup_model()
  File "C:\Users\39826\AppData\Roaming\Python\Python39\site-packages\nni\compression\pytorch\speedup\compressor.py", line 509, in speedup_model
    self.infer_modules_masks()
  File "C:\Users\39826\AppData\Roaming\Python\Python39\site-packages\nni\compression\pytorch\speedup\compressor.py", line 355, in infer_modules_masks
    self.update_direct_sparsity(curnode)
  File "C:\Users\39826\AppData\Roaming\Python\Python39\site-packages\nni\compression\pytorch\speedup\compressor.py", line 216, in update_direct_sparsity
    _auto_infer = AutoMaskInference(
  File "C:\Users\39826\AppData\Roaming\Python\Python39\site-packages\nni\compression\pytorch\speedup\infer_mask.py", line 85, in __init__
    self.output = self.module(dummy_input[0])
TypeError: mul() missing 1 required positional arguments: "other"

I don't know what happened in mul(),the problem has never been solved

Q3:I also directly skip a block by using cfg_list and I found it doesn't work.

[2022-01-25 18:58:28] INFO (FixMaskConflict/MainThread) {'backbone.focus.CSM1.c1': 1, 'backbone.CSM1.c1': 1, 'backbone.CSP1_1.CSM1.c1': 1, 'backbone.CSP1_1.CSM2.c1': 1, 'backbone.CSP1_1.CSM3.c1': 1, 'backbone.CSP1_1.CSM4.c1': 1, 'backbone.CSP1_1.CSM5.c1': 1, 'backbone.CSM2.c1': 1, 'backbone.CSP1_2.CSM1.c1': 1, 'backbone.CSP1_2.CSM2.c1': 1, 'backbone.CSP1_2.CSM3.c1': 1, 'backbone.CSP1_2.CSM4.c1': 1, 'backbone.CSP1_2.CSM5.c1': 1, 'backbone.CSM3.c1': 1, 'backbone.CSP1_3.CSM1.c1': 1, 'backbone.CSP1_3.CSM2.c1': 1, 'backbone.CSP1_3.CSM3.c1': 1, 'backbone.CSP1_3.CSM4.c1': 1, 'backbone.CSP1_3.CSM5.c1': 1, 'backbone.CSM4.c1': 1, 'backbone.SPP.CSM1.c1': 1, 'backbone.SPP.CSM2.c1': 1, 'CSPRes2_body1.CSM1.c1': 1, 'CSPRes2_body1.CSM2.c1': 1, 'CSPRes2_body1.CSM3.c1': 1, 'CSPRes2_body1.CSM4.c1': 1, 'CSPRes2_body1.CSM5.c1': 1, 'CMS1.c1': 1, 'CSPRes2_body2.CSM1.c1': 1, 'CSPRes2_body2.CSM2.c1': 1, 'CSPRes2_body2.CSM3.c1': 1, 'CSPRes2_body2.CSM4.c1': 1, 'CSPRes2_body2.CSM5.c1': 1, 'CMS2.c1': 1, 'CSPRes2_body3.CSM1.c1': 1, 'CSPRes2_body3.CSM2.c1': 1, 'CSPRes2_body3.CSM3.c1': 1, 'CSPRes2_body3.CSM4.c1': 1, 'CSPRes2_body3.CSM5.c1': 1, 'CMS3.c1': 1, 'Conv_Cat1.CSM1.c1': 1, 'CSPRes2_body4.CSM1.c1': 1, 'CSPRes2_body4.CSM2.c1': 1, 'CSPRes2_body4.CSM3.c1': 1, 'CSPRes2_body4.CSM4.c1': 1, 'CSPRes2_body4.CSM5.c1': 1, 'Conv_Cat2.CSM1.c1': 1, 'CSPRes2_body5.CSM1.c1': 1, 'CSPRes2_body5.CSM2.c1': 1, 'CSPRes2_body5.CSM3.c1': 1, 'CSPRes2_body5.CSM4.c1': 1, 'CSPRes2_body5.CSM5.c1': 1, 'conv1': 1, 'conv2': 1, 'conv3': 1}

I chose one of these names.Do you have any specific requirements?

@maxin-cn
Copy link
Author

Hello @J-shang @XinMa-AI .

I am also pruning the yolov5 model now, and there are several problems.

Q1:When I do ModelSpeedup(model, dummy_input=im, masks_file=masks).speedup_model(),the dummy_input() won't match the 'input' of the corresponding layer (to be exact , the silce_block in this picture) in my network, so I changed the *duumy_input to dummy_input[0] to solve this problem.It hasn't gone wrong yet.

Q2:when I do ModelSpeedup(model, dummy_input=im, masks_file=masks).speedup_model(),it will stop with problem

[2022-01-25 18:58:29] INFO (nni.compression.pytorch.speedup.compressor/MainThread) infer module masks...
[2022-01-25 18:58:29] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for backbone.focus.slice_block
[2022-01-25 18:58:29] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for backbone.focus.CSM1.c1
[2022-01-25 18:58:30] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for backbone.focus.CSM1.aten::sigmoid.70
[2022-01-25 18:58:30] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for backbone.focus.CSM1.aten::mul.71
Traceback (most recent call last):
  File "C:\Users\39826\Desktop\caoooooooooooo\nni_prune.py", line 95, in <module>
    m_speedup.speedup_model()
  File "C:\Users\39826\AppData\Roaming\Python\Python39\site-packages\nni\compression\pytorch\speedup\compressor.py", line 509, in speedup_model
    self.infer_modules_masks()
  File "C:\Users\39826\AppData\Roaming\Python\Python39\site-packages\nni\compression\pytorch\speedup\compressor.py", line 355, in infer_modules_masks
    self.update_direct_sparsity(curnode)
  File "C:\Users\39826\AppData\Roaming\Python\Python39\site-packages\nni\compression\pytorch\speedup\compressor.py", line 216, in update_direct_sparsity
    _auto_infer = AutoMaskInference(
  File "C:\Users\39826\AppData\Roaming\Python\Python39\site-packages\nni\compression\pytorch\speedup\infer_mask.py", line 85, in __init__
    self.output = self.module(dummy_input[0])
TypeError: mul() missing 1 required positional arguments: "other"

I don't know what happened in mul(),the problem has never been solved
Q3:I also directly skip a block by using cfg_list and I found it doesn't work.

[2022-01-25 18:58:28] INFO (FixMaskConflict/MainThread) {'backbone.focus.CSM1.c1': 1, 'backbone.CSM1.c1': 1, 'backbone.CSP1_1.CSM1.c1': 1, 'backbone.CSP1_1.CSM2.c1': 1, 'backbone.CSP1_1.CSM3.c1': 1, 'backbone.CSP1_1.CSM4.c1': 1, 'backbone.CSP1_1.CSM5.c1': 1, 'backbone.CSM2.c1': 1, 'backbone.CSP1_2.CSM1.c1': 1, 'backbone.CSP1_2.CSM2.c1': 1, 'backbone.CSP1_2.CSM3.c1': 1, 'backbone.CSP1_2.CSM4.c1': 1, 'backbone.CSP1_2.CSM5.c1': 1, 'backbone.CSM3.c1': 1, 'backbone.CSP1_3.CSM1.c1': 1, 'backbone.CSP1_3.CSM2.c1': 1, 'backbone.CSP1_3.CSM3.c1': 1, 'backbone.CSP1_3.CSM4.c1': 1, 'backbone.CSP1_3.CSM5.c1': 1, 'backbone.CSM4.c1': 1, 'backbone.SPP.CSM1.c1': 1, 'backbone.SPP.CSM2.c1': 1, 'CSPRes2_body1.CSM1.c1': 1, 'CSPRes2_body1.CSM2.c1': 1, 'CSPRes2_body1.CSM3.c1': 1, 'CSPRes2_body1.CSM4.c1': 1, 'CSPRes2_body1.CSM5.c1': 1, 'CMS1.c1': 1, 'CSPRes2_body2.CSM1.c1': 1, 'CSPRes2_body2.CSM2.c1': 1, 'CSPRes2_body2.CSM3.c1': 1, 'CSPRes2_body2.CSM4.c1': 1, 'CSPRes2_body2.CSM5.c1': 1, 'CMS2.c1': 1, 'CSPRes2_body3.CSM1.c1': 1, 'CSPRes2_body3.CSM2.c1': 1, 'CSPRes2_body3.CSM3.c1': 1, 'CSPRes2_body3.CSM4.c1': 1, 'CSPRes2_body3.CSM5.c1': 1, 'CMS3.c1': 1, 'Conv_Cat1.CSM1.c1': 1, 'CSPRes2_body4.CSM1.c1': 1, 'CSPRes2_body4.CSM2.c1': 1, 'CSPRes2_body4.CSM3.c1': 1, 'CSPRes2_body4.CSM4.c1': 1, 'CSPRes2_body4.CSM5.c1': 1, 'Conv_Cat2.CSM1.c1': 1, 'CSPRes2_body5.CSM1.c1': 1, 'CSPRes2_body5.CSM2.c1': 1, 'CSPRes2_body5.CSM3.c1': 1, 'CSPRes2_body5.CSM4.c1': 1, 'CSPRes2_body5.CSM5.c1': 1, 'conv1': 1, 'conv2': 1, 'conv3': 1}

I chose one of these names.Do you have any specific requirements?

Hi, @sqz-07040120:

  1. The yolov5 model you used seems different from the model we used. We don't find any specific requirement for dummy_input. Any shape of dummy_input is ok.
  2. We suffered from some similar problems with you. But we now don't find any solution for them.
  3. You can see this link (https://nni.readthedocs.io/en/stable/Compression/Tutorial.html?highlight=exclude#specify-the-configuration) to find how to use cofig_list. But we also find it does not work.

@sqz-07040120
Copy link

sqz-07040120 commented Jan 25, 2022 via email

@scarlett2018 scarlett2018 added model compression question Further information is requested labels Mar 18, 2022
@scarlett2018
Copy link
Member

Thanks all for the nicely discussion, closing the issue as the issue has been clearly discussed.

@liruichao-eon
Copy link

I encounter the similar problems and using config_list to exclude doesn't work as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model compression question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants