Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use aimet sdk to deploy quantized model in Qualcomm Neural Processing SDK #168

Closed
HaihuaQiu opened this issue Aug 7, 2020 · 24 comments
Assignees

Comments

@HaihuaQiu
Copy link

HaihuaQiu commented Aug 7, 2020

I think this project is very nice, I also know how to quantize FP32model in pytorch, but I do not how to use this sdk to deploy quantized model through Qualcomm Neural Processing SDK to Qualcom DSP phone. Because Qualcomm Neural Processing SDK support self quantize tool。

@quic-akhobare
Copy link
Contributor

Hi @HaihuaQiu.. Please look at the documentation for the Snapdragon Neural Processing SDK to look at the detailed options.

At a high-level, you use AIMET to optimize the model, you can also use AIMET QuantSim to simulate on-target accuracy and performance fine-tuning to make the model better. Once done, you export the QuantSim which results in a modified but a FP32 model.

You can import this model into the Snapdragon Neural Processing SDK like a regular model. Using the dlc-convert and dlc-quantize tools.

Hope that helps.

@HaihuaQiu
Copy link
Author

Hi @HaihuaQiu.. Please look at the documentation for the Snapdragon Neural Processing SDK to look at the detailed options.

At a high-level, you use AIMET to optimize the model, you can also use AIMET QuantSim to simulate on-target accuracy and performance fine-tuning to make the model better. Once done, you export the QuantSim which results in a modified but a FP32 model.

You can import this model into the Snapdragon Neural Processing SDK like a regular model. Using the dlc-convert and dlc-quantize tools.

Hope that helps.

Thanks, you means that we import onnx model created by AIMET QuantSim to SNPE SDK to indirectly achieve quantize optimization??

@quic-akhobare
Copy link
Contributor

Thanks, you means that we import onnx model created by AIMET QuantSim to SNPE SDK to indirectly achieve quantize optimization??

Yes. For example, you can

  1. use AIMET to apply the cross-layer equalization (CLE) and bias correction (BC) techniques.
  2. Then use AIMET QuantSim to perform quantization aware training.
  3. Export the model to ONNX
  4. Import this into the Snapdragon Neural Processing SDK.. When you use the dlc-quantize tool, match the quantization parameters to what was used in Step 2

We have achieved pretty good results with the above recipe.

@HaihuaQiu
Copy link
Author

Thanks, you means that we import onnx model created by AIMET QuantSim to SNPE SDK to indirectly achieve quantize optimization??

Yes. For example, you can

  1. use AIMET to apply the cross-layer equalization (CLE) and bias correction (BC) techniques.
  2. Then use AIMET QuantSim to perform quantization aware training.
  3. Export the model to ONNX
  4. Import this into the Snapdragon Neural Processing SDK.. When you use the dlc-quantize tool, match the quantization parameters to what was used in Step 2

We have achieved pretty good results with the above recipe.

Thanks for your answer!

@quic-ssiddego
Copy link
Contributor

@HaihuaQiu closing this issue for now. Please open this if the issue persists.

@zhuzm0902
Copy link

Thanks, you means that we import onnx model created by AIMET QuantSim to SNPE SDK to indirectly achieve quantize optimization??

Yes. For example, you can

  1. use AIMET to apply the cross-layer equalization (CLE) and bias correction (BC) techniques.
  2. Then use AIMET QuantSim to perform quantization aware training.
  3. Export the model to ONNX
  4. Import this into the Snapdragon Neural Processing SDK.. When you use the dlc-quantize tool, match the quantization parameters to what was used in Step 2

We have achieved pretty good results with the above recipe.

I found that AIMET QuantSim finally exports an float model and a json file with quantization encodings. However, there is no way to feed the json file into dlc-quantize tool of snpe. So, how can I use snpe to quantize model with the json file derived from AIMET?

@quic-akhobare
Copy link
Contributor

I found that AIMET QuantSim finally exports an float model and a json file with quantization encodings. However, there is no way to feed the json file into dlc-quantize tool of snpe. So, how can I use snpe to quantize model with the json file derived from AIMET?

The short answer is that you don't need to feed the json file to SNPE (SnapDragon Neural Processing SDK).. SNPE will calculate equivalent encodings as AIMET via its dlc-quantize tool.

For some future use cases, there may be a need to import AIMET encodings into SNPE (and there is an option to do that with the latest tool), but for now you don't need it.

Hope that answers your question.

@ibei
Copy link

ibei commented Dec 21, 2020

Hi, have you ever met such error? it's a test code of 'QuantizationSimModel'. Thanks.

TypeError: init(): incompatible constructor arguments. The following argument types are supported:

  1. AimetTensorQuantizer.AimetTensorQuantizer(arg0: DlQuantization::QuantizationMode)

@ithmz
Copy link

ithmz commented Sep 16, 2021

Trying to quantize YoloV5 model using aimet but not successfully.
Error: ModuleAttributeError: 'CustomMarker' object has no attribute 'f'

reproduce code:
device = select_device(0) model = attempt_load("./best.pt", map_location=device, inplace=True, fuse=True) sim_yolo.export(path='./', filename_prefix='quantized_yolov5', dummy_input=torch.rand(1, 3, 320, 320))

@quic-ssiddego
Copy link
Contributor

quic-ssiddego commented Nov 30, 2021

@tsangz189 Could you please share the full code you used for quantizing the model, before the export is performed? (details of attempt_load())

@bomerzz
Copy link

bomerzz commented Jan 6, 2022

Hi , I realised that this error occurs because StaticGridQuantWrapper does not implement the following parameters that the yolo parser is expecting the model to have. Thrown on line 145 of yolo.py. I'm not too sure if amending hasattr() checks or modifying StaticGridQuantWrapper to contain this information is the right way to go.

From yolo.py:
m_.i, m_.f, m_.type, m_.np = i, f, t, np # attach index, 'from' index, type, number params

@lblbk
Copy link

lblbk commented Jan 10, 2022

Thanks, you means that we import onnx model created by AIMET QuantSim to SNPE SDK to indirectly achieve quantize optimization??

Yes. For example, you can

  1. use AIMET to apply the cross-layer equalization (CLE) and bias correction (BC) techniques.
  2. Then use AIMET QuantSim to perform quantization aware training.
  3. Export the model to ONNX
  4. Import this into the Snapdragon Neural Processing SDK.. When you use the dlc-quantize tool, match the quantization parameters to what was used in Step 2

We have achieved pretty good results with the above recipe.

sorry to bother you. After training according to this process and feed the json file into dlc-quantize tool of snpe. But the results are very poor, completely different from the results of the aimet model. What is the reason?
Thx!

@hasuoshenyun
Copy link

hasuoshenyun commented Feb 7, 2022

Thanks, you means that we import onnx model created by AIMET QuantSim to SNPE SDK to indirectly achieve quantize optimization??

Yes. For example, you can

  1. use AIMET to apply the cross-layer equalization (CLE) and bias correction (BC) techniques.
  2. Then use AIMET QuantSim to perform quantization aware training.
  3. Export the model to ONNX
  4. Import this into the Snapdragon Neural Processing SDK.. When you use the dlc-quantize tool, match the quantization parameters to what was used in Step 2

We have achieved pretty good results with the above recipe.

I found the activation encodings calculate by dlc-quantize tool of snpe is quite different from that of AIMET. The accuracy of the detection model I trained is always several percentage points lower than that of the floating-point model. Could you please help me with that?

I choose part of encodings and shows like followings.

This is selected from "QAT.encodings.yaml" which is calculated by AIMET:
'485':

  • bitwidth: 8
    is_symmetric: 'False'
    max: 32.36826607584953
    min: -0.2558756172657013
    offset: -2
    scale: 0.1279378105612362
    '488':
  • bitwidth: 8
    is_symmetric: 'False'
    max: 10.826611518859863
    min: -10.74202823638916
    offset: -127
    scale: 0.08458290100097657

And the following is calculated by dlc-quantized tools.
Setting activation for layer: backbone.backbone.stem.act.add#1_Hswish and buffer: 485
[INFO] bw: 8, min: -0.316865, max: 26.616622, delta: 0.105622, offset: -3.000000
[INFO] Setting activation for layer: backbone.backbone.dark2.0.conv and buffer: 488
[INFO] bw: 8, min: -54.411793, max: 54.840237, delta: 0.428439, offset: -127.000000
[INFO] Setting activation for layer: backbone.backbone.dark2.0.act.add#1_Hswish and buffer: 509
[INFO] bw: 8, min: -0.217683, max: 55.291472, delta: 0.217683, offset: -1.000000

@quic-akhobare
Copy link
Contributor

Can you share the flags you passed to SNPE? You want to select the options such that the encodings already determined by AIMET are imported into SNPE. SNPE will not recalculated those then.

@hasuoshenyun
Copy link

Can you share the flags you passed to SNPE? You want to select the options such that the encodings already determined by AIMET are imported into SNPE. SNPE will not recalculated those then.

The flags I used are:
snpe-dlc-quantize --input_dlc xx/QAT.dlc --input_list xx/raw_list.txt --use_enhanced_quantizer --output_dlc xx/QAT_quantized.dlc

And I found the min&max values are basically the same after activation layer between "Aimet" calculation and SNPE calculation, but are very different in other layers.

Besides, I trained a "detection model", the accuracy of online evaluation of the quantsim model of AIMET is almost the same as the float model, but the accuracy of the post-static quantized model is always several percentage points lower than that of the floating-point model. I got the outputs using snpe-net-run and then parse the output to calculate the mAP. Could you help me with that?

@hasuoshenyun
Copy link

Actually, I find that, AIMET quantization accuracy using simulation model is higher than the post-static quantized model using SNPE. maybe 2~3 percentage points of mAP of an detection model.

@xmfbit
Copy link

xmfbit commented Apr 16, 2022

Hi, have you ever met such error? it's a test code of 'QuantizationSimModel'. Thanks.

TypeError: init(): incompatible constructor arguments. The following argument types are supported:

  1. AimetTensorQuantizer.AimetTensorQuantizer(arg0: DlQuantization::QuantizationMode)

@ibei hi, did you solve the problem? it's so weird...

@benv2k6
Copy link

benv2k6 commented Jul 3, 2022

@xmfbit Hey, let me know if you could solve this issue, we are facing the same error

@quic-akhobare
Copy link
Contributor

@benv2k6 - can you share how exactly you are instantiating the QuantizationSimModel? And share the stack trace. I think we should create a separate ticket for this.

@benv2k6
Copy link

benv2k6 commented Jul 6, 2022

Hi @quic-akhobare , thank you for your quick response.

The code to reproduce the error was:

import torch
from torchvision import models
from aimet_torch.quantsim import QuantizationSimModel
BATCH_SIZE = 1
input_shape = (BATCH_SIZE, 3, 224, 224)
model = models.resnet18(pretrained=True).cuda()
sim = QuantizationSimModel(model, default_output_bw=8, default_param_bw=8, dummy_input= torch.rand(input_shape).cuda())

I had this error when building the library myself, since we use python3.8 and the releases had python3.6 (up until yesterday).
It seems this error stems from the fact that libpymo.so is passing an enum (QuantizationMode) to AimetTenstorQuantizer**.so, which doesn't recognize it as the same type.
I had resolved that by compiling AimetTensorQuantizer class into libpymo.so (that way this type is not passed between two shared libraries), but then I had an issue with the export command (errors from onnx about unsupported Custom Marker).

Eventually those issues were resolved yesterday since your release of aimet for py38, so thank you very much for that. 🙇

But from this experience it seems that currently, building from source is really specific to certain versions of python, ubuntu system libraries and c++ toolchains.
For now we managed, since our team uses python3.8, and we can work with the dependent version of torch and torchvision from aimet (along with some other dependecies that the whl brings in).

@gd2016229035
Copy link

gd2016229035 commented Dec 30, 2022

Can you share the flags you passed to SNPE? You want to select the options such that the encodings already determined by AIMET are imported into SNPE. SNPE will not recalculated those then.

The flags I used are: snpe-dlc-quantize --input_dlc xx/QAT.dlc --input_list xx/raw_list.txt --use_enhanced_quantizer --output_dlc xx/QAT_quantized.dlc

And I found the min&max values are basically the same after activation layer between "Aimet" calculation and SNPE calculation, but are very different in other layers.

Besides, I trained a "detection model", the accuracy of online evaluation of the quantsim model of AIMET is almost the same as the float model, but the accuracy of the post-static quantized model is always several percentage points lower than that of the floating-point model. I got the outputs using snpe-net-run and then parse the output to calculate the mAP. Could you help me with that?

@hasuoshenyun @quic-akhobare I also meet the same error : " the min&max values are basically the same after activation layer between "Aimet" calculation and SNPE calculation, but are very different in other layers.".
Please let me know if you have some advice~

@benv2k6
Copy link

benv2k6 commented Dec 31, 2022 via email

@WithFoxSquirrel
Copy link

WithFoxSquirrel commented Jun 28, 2023

Hi, I'm not sure I remember correctly the details, but please make sure you pass the correct flag to the dlc converter and the dlc quantizer. When you use Aimet, you need to pass a flag to the dlc converter. In my question I've asked about the quantizer, which might be the reason for the issue. Hope it helps, Ben בתאריך יום ו׳, 30 בדצמ׳ 2022, 15:53, מאת Guan Dai @.***

: Can you share the flags you passed to SNPE? You want to select the options such that the encodings already determined by AIMET are imported into SNPE. SNPE will not recalculated those then. The flags I used are: snpe-dlc-quantize --input_dlc xx/QAT.dlc --input_list xx/raw_list.txt --use_enhanced_quantizer --output_dlc xx/QAT_quantized.dlc And I found the min&max values are basically the same after activation layer between "Aimet" calculation and SNPE calculation, but are very different in other layers. Besides, I trained a "detection model", the accuracy of online evaluation of the quantsim model of AIMET is almost the same as the float model, but the accuracy of the post-static quantized model is always several percentage points lower than that of the floating-point model. I got the outputs using snpe-net-run and then parse the output to calculate the mAP. Could you help me with that? Hi I also meet the same error : " the min&max values are basically the same after activation layer between "Aimet" calculation and SNPE calculation, but are very different in other layers.". Please let me know if you have some advice~ — Reply to this email directly, view it on GitHub <#168 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6SFY7BICUIO35Z2FXAHCLWP3SM7ANCNFSM4PXJSPZQ . You are receiving this because you were mentioned.Message ID: @.***>

Have you fix this? i met the same problem, also i build the aimet myself.

Hi @quic-akhobare , thank you for your quick response.

The code to reproduce the error was:

import torch
from torchvision import models
from aimet_torch.quantsim import QuantizationSimModel
BATCH_SIZE = 1
input_shape = (BATCH_SIZE, 3, 224, 224)
model = models.resnet18(pretrained=True).cuda()
sim = QuantizationSimModel(model, default_output_bw=8, default_param_bw=8, dummy_input= torch.rand(input_shape).cuda())

I had this error when building the library myself, since we use python3.8 and the releases had python3.6 (up until yesterday). It seems this error stems from the fact that libpymo.so is passing an enum (QuantizationMode) to AimetTenstorQuantizer**.so, which doesn't recognize it as the same type. I had resolved that by compiling AimetTensorQuantizer class into libpymo.so (that way this type is not passed between two shared libraries), but then I had an issue with the export command (errors from onnx about unsupported Custom Marker).

Eventually those issues were resolved yesterday since your release of aimet for py38, so thank you very much for that. bow

But from this experience it seems that currently, building from source is really specific to certain versions of python, ubuntu system libraries and c++ toolchains. For now we managed, since our team uses python3.8, and we can work with the dependent version of torch and torchvision from aimet (along with some other dependecies that the whl brings in).

I met the same problem when i use the aimet builded from source, and i use the source branch < release-aimet-1.26> and py38, could you tell me how to compile "AimetTensorQuantizer class into libpymo.so" as you mentioned. Thank you!

@benv2k6
Copy link

benv2k6 commented Jun 28, 2023

@WithFoxSquirrel Hi, the problem was solved by itself when I used the release of Aimet with python 3.8, it was really some days after I posted the question . So essentially i didn't had to compile it myself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests