Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Use QAT for Segmentation with YOLOv6? #1055

Open
1 of 2 tasks
hamedgorji opened this issue Jun 12, 2024 · 6 comments
Open
1 of 2 tasks

How to Use QAT for Segmentation with YOLOv6? #1055

hamedgorji opened this issue Jun 12, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@hamedgorji
Copy link

Search before asking

  • I have searched the YOLOv6 issues and found no similar feature requests.

Description

Hi YOLOv6 Team,

I am currently working on a project that requires Quantization-Aware Training (QAT) for segmentation tasks using YOLOv6. I noticed that configurations like yolov6n_hs, yolov6n_opt, and yolov6n_opt_qat are available for detection but not for segmentation.

To achieve QAT for segmentation, should I add the following configurations at the end of my config file:

ptq = dict(
    num_bits = 8,
    calib_batches = 4,
    # 'max', 'histogram'
    calib_method = 'max',
    # 'entropy', 'percentile', 'mse'
    histogram_amax_method='entropy',
    histogram_amax_percentile=99.99,
    calib_output_path='./',
    sensitive_layers_skip=False,
    sensitive_layers_list=[],
)

qat = dict(
    calib_pt = './assets/v6s_n_calib_max.pt',
    sensitive_layers_skip = False,
    sensitive_layers_list=[],
)

# Choose Rep-block by the training mode, choices=["repvgg", "hyper-search", "repopt"]
training_mode='repopt'

Could you please guide me on the correct approach to enable QAT for segmentation tasks?
Are there any example configurations or guidelines available for integrating QAT with segmentation in YOLOv6?

Thank you.

Use case

No response

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@hamedgorji hamedgorji added the enhancement New feature or request label Jun 12, 2024
@hamedgorji
Copy link
Author

hamedgorji commented Jun 14, 2024

Update: I changed my config to:

# YOLOv6n-seg model
model = dict(
    type='YOLOv6n',
    pretrained='D:/YOLOv6-seg/assets/pretrained_opt.pt',
    scales='D:/YOLOv6-seg/assets/scale.pt',
    depth_multiple=0.33,
    width_multiple=0.25,
    backbone=dict(
        type='EfficientRep',
        num_repeats=[1, 6, 12, 18, 6],
        out_channels=[64, 128, 256, 512, 1024],
        fuse_P2=True,
        cspsppf=True,
        ),
    neck=dict(
        type='RepBiFPANNeck',
        num_repeats=[12, 12, 12, 12],
        out_channels=[256, 128, 128, 256, 256, 512],
        ),
    head=dict(
        type='EffiDeHead',
        in_channels=[128, 256, 512],
        num_layers=3,
        begin_indices=24,
        npr=256,
        nm=32,
        isseg=True,
        issolo=False,
        anchors=3,
        anchors_init=[[10,13, 19,19, 33,23],
                      [30,61, 59,59, 59,119],
                      [116,90, 185,185, 373,326]],
        out_indices=[17, 20, 23],
        strides=[8, 16, 32],
        atss_warmup_epoch=0,
        iou_type='siou',
        use_dfl=False, # set to True if you want to further train with distillation
        reg_max=0, # set to 16 if you want to further train with distillation
        distill_weight={
            'class': 1.0,
            'dfl': 1.0,
        },
    )
)

solver = dict(
    optim='SGD',
    lr_scheduler='Cosine',
    lr0=0.02,
    lrf=0.01,
    momentum=0.937,
    weight_decay=0.001,
    warmup_epochs=3.0,
    warmup_momentum=0.8,
    warmup_bias_lr=0.1
)

data_aug = dict(
    hsv_h=0.015,
    hsv_s=0.7,
    hsv_v=0.4,
    degrees=0.0,
    translate=0.1,
    scale=0.5,
    shear=0.0,
    flipud=0.0,
    fliplr=0.5,
    mosaic=1.0,
    mixup=0.0,
)


ptq = dict(
    num_bits = 8,
    calib_batches = 4,
    # 'max', 'histogram'
    calib_method = 'max',
    # 'entropy', 'percentile', 'mse'
    histogram_amax_method='entropy',
    histogram_amax_percentile=99.99,
    calib_output_path='./',
    sensitive_layers_skip=False,
    sensitive_layers_list=[],
)

qat = dict(
    calib_pt = './assets/v6n_calib_max.pt',
    sensitive_layers_skip = False,
    sensitive_layers_list=[],
)
# Choose Rep-block by the training Mode, choices=["repvgg", "hyper-search", "repopt"]
training_mode='repopt'

I then ran the following command:

python tools/train.py --data data/data.yaml --output-dir ./runs/train_im256_30636_qat --conf configs/yolov6n_seg_opt_qat.py --quant --distill --distill_feat --batch 32 --epochs 10 --workers 32 --teacher_model_path "D:/YOLOv6-seg/assets/pretrained_opt.pt" --device 0

But it loaded the model first and at the end gave me this error:

Skip Layer detect.proj_conv
Traceback (most recent call last):
  File "D:\YOLOv6-seg\tools\train.py", line 142, in <module>
    main(args)
  File "D:\YOLOv6-seg\tools\train.py", line 127, in main
    trainer = Trainer(args, cfg, device)
  File "D:\YOLOv6-seg\yolov6\core\engine.py", line 68, in __init__
    self.quant_setup(model, cfg, device)
  File "D:\YOLOv6-seg\yolov6\core\engine.py", line 602, in quant_setup
    model.neck.upsample_enable_quant(cfg.ptq.num_bits, cfg.ptq.calib_method)
  File "C:\Users\Hamed\miniconda3\envs\yolov6\lib\site-packages\torch\nn\modules\module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'RepBiFPANNeck' object has no attribute 'upsample_enable_quant'

I got this error for both PTQ and QAT.

@hamedgorji
Copy link
Author

Update2: I fixed the above error by adding the following function to RepBiFPANNeck class

    def upsample_enable_quant(self, num_bits, calib_method):
        print("Insert fakequant after upsample")
        from pytorch_quantization import nn as quant_nn
        from pytorch_quantization.tensor_quant import QuantDescriptor
        conv2d_input_default_desc = QuantDescriptor(num_bits=num_bits, calib_method=calib_method)
        self.upsample_feat0_quant = quant_nn.TensorQuantizer(conv2d_input_default_desc)
        self.upsample_feat1_quant = quant_nn.TensorQuantizer(conv2d_input_default_desc)
        self._QUANT = True

But now I get another error regarding calib max when I try to do PTQ

python tools/train.py --data data/data.yaml --output-dir ./runs/train_im256_30636_ptq --conf configs/yolov6n_seg_opt_qat.py --quant --calib --batch 16 --workers 0 --device 0

Traceback (most recent call last):
  File "D:\YOLOv6-seg\tools\train.py", line 142, in <module>
    main(args)
  File "D:\YOLOv6-seg\tools\train.py", line 130, in main
    trainer.calibrate(cfg)
  File "D:\YOLOv6-seg\yolov6\core\engine.py", line 592, in calibrate
    ptq_calibrate(self.model, self.train_loader, cfg)
  File "D:\YOLOv6-seg\tools\qat\qat_utils.py", line 61, in ptq_calibrate
    compute_amax(model, method=cfg.ptq.histogram_amax_method, percentile=cfg.ptq.histogram_amax_percentile)
  File "D:\YOLOv6-seg\tools\qat\qat_utils.py", line 47, in compute_amax
    module.load_calib_amax()
  File "C:\Users\Hamed\miniconda3\envs\yolov6\lib\site-packages\pytorch_quantization\nn\modules\tensor_quantizer.py", line 237, in load_calib_amax
    raise RuntimeError(err_msg + " Passing 'strict=False' to `load_calib_amax()` will ignore the error.")
RuntimeError: Calibrator returned None. This usually happens when calibrator hasn't seen any tensor. Passing 'strict=False' to `load_calib_amax()` will ignore the error.

@hamedgorji
Copy link
Author

@Chilicyy Any thoughts on this?

@hamedgorji
Copy link
Author

hamedgorji commented Jun 24, 2024

Update 3: As I mentioned above during the PTQ process, I encountered a new error related to calibration maximum (calib max). Specifically, the error message indicated that the calibrator returned None, suggesting it hasn't seen any tensor during calibration.

To diagnose this, I added detailed logging and discovered that the neck.upsample_feat0_quant and neck.upsample_feat1_quant layers were encountering issues:

Error for neck.upsample_feat0_quant: Calibrator returned None. This usually happens when calibrator hasn't seen any tensor. Passing 'strict=False' to `load_calib_amax()` will ignore the error.
Loaded calib_amax for neck.upsample_feat0_quant
Error for neck.upsample_feat1_quant: Calibrator returned None. This usually happens when calibrator hasn't seen any tensor. Passing 'strict=False' to `load_calib_amax()` will ignore the error.
Loaded calib_amax for neck.upsample_feat1_quant

It seems that during the calibration phase, these layers are not receiving the expected data, leading to the calibrator returning None.

Maybe the issue is because of adding the following function to the RepBiFPANNeck class without modifying the forward pass.

    def upsample_enable_quant(self, num_bits, calib_method):
        print("Insert fakequant after upsample")
        # Insert fakequant after upsample op to build TensorRT engine
        from pytorch_quantization import nn as quant_nn
        from pytorch_quantization.tensor_quant import QuantDescriptor
        conv2d_input_default_desc = QuantDescriptor(num_bits=num_bits, calib_method=calib_method)
        self.upsample_feat0_quant = quant_nn.TensorQuantizer(conv2d_input_default_desc)
        self.upsample_feat1_quant = quant_nn.TensorQuantizer(conv2d_input_default_desc)
        # global _QUANT
        self._QUANT = True

Any suggestions?

@hamedgorji
Copy link
Author

Update 4: I've made some changes to the RepBiFPANNeck forward function similar to the RepPANNeck, and the problem has been solved.

    def forward(self, input):
        (x3, x2, x1, x0) = input

        fpn_out0 = self.reduce_layer0(x0)
        f_concat_layer0 = self.Bifusion0([fpn_out0, x1, x2])
        if hasattr(self, '_QUANT') and self._QUANT is True:
            f_concat_layer0 = self.upsample_feat0_quant(f_concat_layer0)
        f_out0 = self.Rep_p4(f_concat_layer0)

        fpn_out1 = self.reduce_layer1(f_out0)
        f_concat_layer1 = self.Bifusion1([fpn_out1, x2, x3])
        if hasattr(self, '_QUANT') and self._QUANT is True:
            f_concat_layer1 = self.upsample_feat1_quant(f_concat_layer1)
        pan_out2 = self.Rep_p3(f_concat_layer1)

        down_feat1 = self.downsample2(pan_out2)
        p_concat_layer1 = torch.cat([down_feat1, fpn_out1], 1)
        pan_out1 = self.Rep_n3(p_concat_layer1)

        down_feat0 = self.downsample1(pan_out1)
        p_concat_layer2 = torch.cat([down_feat0, fpn_out0], 1)
        pan_out0 = self.Rep_n4(p_concat_layer2)

        outputs = [pan_out2, pan_out1, pan_out0]

        return outputs

@hamedgorji
Copy link
Author

hamedgorji commented Jul 11, 2024

@zhiyelee @cfc4n @yeldarby @rainsun
I was finally able to train my model using the QAT approach. However, when I converted it to ONNX using qat_export.py, the model failed to perform any segmentation. This segmentation QAT approach has been quite problematic, and I'm puzzled as to why the authors included this section when it is not fully tested. I spent about three weeks troubleshooting various issues but still couldn't get it to work.

If it had been mentioned that the segmentation model does not support QAT, I could have explored other options instead of losing three weeks of my time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant