How to use aimet sdk to deploy quantized model in Qualcomm Neural Processing SDK #168

HaihuaQiu · 2020-08-07T06:15:35Z

I think this project is very nice, I also know how to quantize FP32model in pytorch, but I do not how to use this sdk to deploy quantized model through Qualcomm Neural Processing SDK to Qualcom DSP phone. Because Qualcomm Neural Processing SDK support self quantize tool。

quic-akhobare · 2020-10-22T04:45:47Z

Hi @HaihuaQiu.. Please look at the documentation for the Snapdragon Neural Processing SDK to look at the detailed options.

At a high-level, you use AIMET to optimize the model, you can also use AIMET QuantSim to simulate on-target accuracy and performance fine-tuning to make the model better. Once done, you export the QuantSim which results in a modified but a FP32 model.

You can import this model into the Snapdragon Neural Processing SDK like a regular model. Using the dlc-convert and dlc-quantize tools.

Hope that helps.

HaihuaQiu · 2020-10-23T10:11:12Z

Hi @HaihuaQiu.. Please look at the documentation for the Snapdragon Neural Processing SDK to look at the detailed options.

At a high-level, you use AIMET to optimize the model, you can also use AIMET QuantSim to simulate on-target accuracy and performance fine-tuning to make the model better. Once done, you export the QuantSim which results in a modified but a FP32 model.

You can import this model into the Snapdragon Neural Processing SDK like a regular model. Using the dlc-convert and dlc-quantize tools.

Hope that helps.

Thanks, you means that we import onnx model created by AIMET QuantSim to SNPE SDK to indirectly achieve quantize optimization??

quic-akhobare · 2020-10-23T15:24:35Z

Thanks, you means that we import onnx model created by AIMET QuantSim to SNPE SDK to indirectly achieve quantize optimization??

Yes. For example, you can

use AIMET to apply the cross-layer equalization (CLE) and bias correction (BC) techniques.
Then use AIMET QuantSim to perform quantization aware training.
Export the model to ONNX
Import this into the Snapdragon Neural Processing SDK.. When you use the dlc-quantize tool, match the quantization parameters to what was used in Step 2

We have achieved pretty good results with the above recipe.

HaihuaQiu · 2020-10-26T05:51:41Z

Thanks, you means that we import onnx model created by AIMET QuantSim to SNPE SDK to indirectly achieve quantize optimization??

Yes. For example, you can

use AIMET to apply the cross-layer equalization (CLE) and bias correction (BC) techniques.

Then use AIMET QuantSim to perform quantization aware training.

Export the model to ONNX

Import this into the Snapdragon Neural Processing SDK.. When you use the dlc-quantize tool, match the quantization parameters to what was used in Step 2

We have achieved pretty good results with the above recipe.

Thanks for your answer!

quic-ssiddego · 2020-11-06T23:09:13Z

@HaihuaQiu closing this issue for now. Please open this if the issue persists.

zhuzm0902 · 2020-12-09T08:09:00Z

Thanks, you means that we import onnx model created by AIMET QuantSim to SNPE SDK to indirectly achieve quantize optimization??

Yes. For example, you can

use AIMET to apply the cross-layer equalization (CLE) and bias correction (BC) techniques.

Then use AIMET QuantSim to perform quantization aware training.

Export the model to ONNX

Import this into the Snapdragon Neural Processing SDK.. When you use the dlc-quantize tool, match the quantization parameters to what was used in Step 2

We have achieved pretty good results with the above recipe.

I found that AIMET QuantSim finally exports an float model and a json file with quantization encodings. However, there is no way to feed the json file into dlc-quantize tool of snpe. So, how can I use snpe to quantize model with the json file derived from AIMET?

quic-akhobare · 2020-12-10T06:13:26Z

I found that AIMET QuantSim finally exports an float model and a json file with quantization encodings. However, there is no way to feed the json file into dlc-quantize tool of snpe. So, how can I use snpe to quantize model with the json file derived from AIMET?

The short answer is that you don't need to feed the json file to SNPE (SnapDragon Neural Processing SDK).. SNPE will calculate equivalent encodings as AIMET via its dlc-quantize tool.

For some future use cases, there may be a need to import AIMET encodings into SNPE (and there is an option to do that with the latest tool), but for now you don't need it.

Hope that answers your question.

ibei · 2020-12-21T12:21:49Z

Hi, have you ever met such error? it's a test code of 'QuantizationSimModel'. Thanks.

TypeError: init(): incompatible constructor arguments. The following argument types are supported:

AimetTensorQuantizer.AimetTensorQuantizer(arg0: DlQuantization::QuantizationMode)

ithmz · 2021-09-16T06:14:05Z

Trying to quantize YoloV5 model using aimet but not successfully.
Error: ModuleAttributeError: 'CustomMarker' object has no attribute 'f'

reproduce code:
device = select_device(0) model = attempt_load("./best.pt", map_location=device, inplace=True, fuse=True) sim_yolo.export(path='./', filename_prefix='quantized_yolov5', dummy_input=torch.rand(1, 3, 320, 320))

quic-ssiddego · 2021-11-30T21:27:30Z

@tsangz189 Could you please share the full code you used for quantizing the model, before the export is performed? (details of attempt_load())

bomerzz · 2022-01-06T04:23:46Z

Hi , I realised that this error occurs because StaticGridQuantWrapper does not implement the following parameters that the yolo parser is expecting the model to have. Thrown on line 145 of yolo.py. I'm not too sure if amending hasattr() checks or modifying StaticGridQuantWrapper to contain this information is the right way to go.

From yolo.py:
m_.i, m_.f, m_.type, m_.np = i, f, t, np # attach index, 'from' index, type, number params

lblbk · 2022-01-10T06:33:54Z

Thanks, you means that we import onnx model created by AIMET QuantSim to SNPE SDK to indirectly achieve quantize optimization??

Yes. For example, you can

use AIMET to apply the cross-layer equalization (CLE) and bias correction (BC) techniques.

Then use AIMET QuantSim to perform quantization aware training.

Export the model to ONNX

Import this into the Snapdragon Neural Processing SDK.. When you use the dlc-quantize tool, match the quantization parameters to what was used in Step 2

We have achieved pretty good results with the above recipe.

sorry to bother you. After training according to this process and feed the json file into dlc-quantize tool of snpe. But the results are very poor, completely different from the results of the aimet model. What is the reason?
Thx!

hasuoshenyun · 2022-02-07T06:59:37Z

Thanks, you means that we import onnx model created by AIMET QuantSim to SNPE SDK to indirectly achieve quantize optimization??

Yes. For example, you can

use AIMET to apply the cross-layer equalization (CLE) and bias correction (BC) techniques.

Then use AIMET QuantSim to perform quantization aware training.

Export the model to ONNX

Import this into the Snapdragon Neural Processing SDK.. When you use the dlc-quantize tool, match the quantization parameters to what was used in Step 2

We have achieved pretty good results with the above recipe.

I found the activation encodings calculate by dlc-quantize tool of snpe is quite different from that of AIMET. The accuracy of the detection model I trained is always several percentage points lower than that of the floating-point model. Could you please help me with that?

I choose part of encodings and shows like followings.

This is selected from "QAT.encodings.yaml" which is calculated by AIMET:
'485':

bitwidth: 8
is_symmetric: 'False'
max: 32.36826607584953
min: -0.2558756172657013
offset: -2
scale: 0.1279378105612362
'488':
bitwidth: 8
is_symmetric: 'False'
max: 10.826611518859863
min: -10.74202823638916
offset: -127
scale: 0.08458290100097657

And the following is calculated by dlc-quantized tools.
Setting activation for layer: backbone.backbone.stem.act.add#1_Hswish and buffer: 485
[INFO] bw: 8, min: -0.316865, max: 26.616622, delta: 0.105622, offset: -3.000000
[INFO] Setting activation for layer: backbone.backbone.dark2.0.conv and buffer: 488
[INFO] bw: 8, min: -54.411793, max: 54.840237, delta: 0.428439, offset: -127.000000
[INFO] Setting activation for layer: backbone.backbone.dark2.0.act.add#1_Hswish and buffer: 509
[INFO] bw: 8, min: -0.217683, max: 55.291472, delta: 0.217683, offset: -1.000000

quic-akhobare · 2022-02-07T19:53:08Z

Can you share the flags you passed to SNPE? You want to select the options such that the encodings already determined by AIMET are imported into SNPE. SNPE will not recalculated those then.

hasuoshenyun · 2022-02-08T08:19:57Z

Can you share the flags you passed to SNPE? You want to select the options such that the encodings already determined by AIMET are imported into SNPE. SNPE will not recalculated those then.

The flags I used are:
snpe-dlc-quantize --input_dlc xx/QAT.dlc --input_list xx/raw_list.txt --use_enhanced_quantizer --output_dlc xx/QAT_quantized.dlc

And I found the min&max values are basically the same after activation layer between "Aimet" calculation and SNPE calculation, but are very different in other layers.

Besides, I trained a "detection model", the accuracy of online evaluation of the quantsim model of AIMET is almost the same as the float model, but the accuracy of the post-static quantized model is always several percentage points lower than that of the floating-point model. I got the outputs using snpe-net-run and then parse the output to calculate the mAP. Could you help me with that?

hasuoshenyun · 2022-02-08T09:17:17Z

Actually, I find that, AIMET quantization accuracy using simulation model is higher than the post-static quantized model using SNPE. maybe 2~3 percentage points of mAP of an detection model.

xmfbit · 2022-04-16T09:13:40Z

Hi, have you ever met such error? it's a test code of 'QuantizationSimModel'. Thanks.

TypeError: init(): incompatible constructor arguments. The following argument types are supported:

AimetTensorQuantizer.AimetTensorQuantizer(arg0: DlQuantization::QuantizationMode)

@ibei hi, did you solve the problem? it's so weird...

benv2k6 · 2022-07-03T07:34:06Z

@xmfbit Hey, let me know if you could solve this issue, we are facing the same error

quic-akhobare · 2022-07-05T22:24:32Z

@benv2k6 - can you share how exactly you are instantiating the QuantizationSimModel? And share the stack trace. I think we should create a separate ticket for this.

benv2k6 · 2022-07-06T04:44:44Z

Hi @quic-akhobare , thank you for your quick response.

The code to reproduce the error was:

import torch
from torchvision import models
from aimet_torch.quantsim import QuantizationSimModel
BATCH_SIZE = 1
input_shape = (BATCH_SIZE, 3, 224, 224)
model = models.resnet18(pretrained=True).cuda()
sim = QuantizationSimModel(model, default_output_bw=8, default_param_bw=8, dummy_input= torch.rand(input_shape).cuda())

I had this error when building the library myself, since we use python3.8 and the releases had python3.6 (up until yesterday).
It seems this error stems from the fact that libpymo.so is passing an enum (QuantizationMode) to AimetTenstorQuantizer**.so, which doesn't recognize it as the same type.
I had resolved that by compiling AimetTensorQuantizer class into libpymo.so (that way this type is not passed between two shared libraries), but then I had an issue with the export command (errors from onnx about unsupported Custom Marker).

Eventually those issues were resolved yesterday since your release of aimet for py38, so thank you very much for that. 🙇

But from this experience it seems that currently, building from source is really specific to certain versions of python, ubuntu system libraries and c++ toolchains.
For now we managed, since our team uses python3.8, and we can work with the dependent version of torch and torchvision from aimet (along with some other dependecies that the whl brings in).

gd2016229035 · 2022-12-30T13:53:09Z

Can you share the flags you passed to SNPE? You want to select the options such that the encodings already determined by AIMET are imported into SNPE. SNPE will not recalculated those then.

The flags I used are: snpe-dlc-quantize --input_dlc xx/QAT.dlc --input_list xx/raw_list.txt --use_enhanced_quantizer --output_dlc xx/QAT_quantized.dlc

And I found the min&max values are basically the same after activation layer between "Aimet" calculation and SNPE calculation, but are very different in other layers.

Besides, I trained a "detection model", the accuracy of online evaluation of the quantsim model of AIMET is almost the same as the float model, but the accuracy of the post-static quantized model is always several percentage points lower than that of the floating-point model. I got the outputs using snpe-net-run and then parse the output to calculate the mAP. Could you help me with that?

@hasuoshenyun @quic-akhobare I also meet the same error : " the min&max values are basically the same after activation layer between "Aimet" calculation and SNPE calculation, but are very different in other layers.".
Please let me know if you have some advice~

benv2k6 · 2022-12-31T10:41:39Z

Hi, I'm not sure I remember correctly the details, but please make sure you pass the correct flag to the dlc converter and the dlc quantizer. When you use Aimet, you need to pass a flag to the dlc converter. In my question I've asked about the quantizer, which might be the reason for the issue. Hope it helps, Ben בתאריך יום ו׳, 30 בדצמ׳ 2022, 15:53, מאת Guan Dai ***@***.***

…

: Can you share the flags you passed to SNPE? You want to select the options such that the encodings already determined by AIMET are imported into SNPE. SNPE will not recalculated those then. The flags I used are: snpe-dlc-quantize --input_dlc xx/QAT.dlc --input_list xx/raw_list.txt --use_enhanced_quantizer --output_dlc xx/QAT_quantized.dlc And I found the min&max values are basically the same after activation layer between "Aimet" calculation and SNPE calculation, but are very different in other layers. Besides, I trained a "detection model", the accuracy of online evaluation of the quantsim model of AIMET is almost the same as the float model, but the accuracy of the post-static quantized model is always several percentage points lower than that of the floating-point model. I got the outputs using snpe-net-run and then parse the output to calculate the mAP. Could you help me with that? Hi I also meet the same error : " the min&max values are basically the same after activation layer between "Aimet" calculation and SNPE calculation, but are very different in other layers.". Please let me know if you have some advice~ — Reply to this email directly, view it on GitHub <#168 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA6SFY7BICUIO35Z2FXAHCLWP3SM7ANCNFSM4PXJSPZQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

WithFoxSquirrel · 2023-06-28T04:42:38Z

Hi, I'm not sure I remember correctly the details, but please make sure you pass the correct flag to the dlc converter and the dlc quantizer. When you use Aimet, you need to pass a flag to the dlc converter. In my question I've asked about the quantizer, which might be the reason for the issue. Hope it helps, Ben בתאריך יום ו׳, 30 בדצמ׳ 2022, 15:53, מאת Guan Dai @.***
…
: Can you share the flags you passed to SNPE? You want to select the options such that the encodings already determined by AIMET are imported into SNPE. SNPE will not recalculated those then. The flags I used are: snpe-dlc-quantize --input_dlc xx/QAT.dlc --input_list xx/raw_list.txt --use_enhanced_quantizer --output_dlc xx/QAT_quantized.dlc And I found the min&max values are basically the same after activation layer between "Aimet" calculation and SNPE calculation, but are very different in other layers. Besides, I trained a "detection model", the accuracy of online evaluation of the quantsim model of AIMET is almost the same as the float model, but the accuracy of the post-static quantized model is always several percentage points lower than that of the floating-point model. I got the outputs using snpe-net-run and then parse the output to calculate the mAP. Could you help me with that? Hi I also meet the same error : " the min&max values are basically the same after activation layer between "Aimet" calculation and SNPE calculation, but are very different in other layers.". Please let me know if you have some advice~ — Reply to this email directly, view it on GitHub <#168 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6SFY7BICUIO35Z2FXAHCLWP3SM7ANCNFSM4PXJSPZQ . You are receiving this because you were mentioned.Message ID: @.***>

Have you fix this? i met the same problem, also i build the aimet myself.

Hi @quic-akhobare , thank you for your quick response.

The code to reproduce the error was:
import torch
from torchvision import models
from aimet_torch.quantsim import QuantizationSimModel
BATCH_SIZE = 1
input_shape = (BATCH_SIZE, 3, 224, 224)
model = models.resnet18(pretrained=True).cuda()
sim = QuantizationSimModel(model, default_output_bw=8, default_param_bw=8, dummy_input= torch.rand(input_shape).cuda())
I had this error when building the library myself, since we use python3.8 and the releases had python3.6 (up until yesterday). It seems this error stems from the fact that libpymo.so is passing an enum (QuantizationMode) to AimetTenstorQuantizer**.so, which doesn't recognize it as the same type. I had resolved that by compiling AimetTensorQuantizer class into libpymo.so (that way this type is not passed between two shared libraries), but then I had an issue with the export command (errors from onnx about unsupported Custom Marker).

Eventually those issues were resolved yesterday since your release of aimet for py38, so thank you very much for that. bow

But from this experience it seems that currently, building from source is really specific to certain versions of python, ubuntu system libraries and c++ toolchains. For now we managed, since our team uses python3.8, and we can work with the dependent version of torch and torchvision from aimet (along with some other dependecies that the whl brings in).

I met the same problem when i use the aimet builded from source, and i use the source branch < release-aimet-1.26> and py38, could you tell me how to compile "AimetTensorQuantizer class into libpymo.so" as you mentioned. Thank you!

benv2k6 · 2023-06-28T16:58:08Z

@WithFoxSquirrel Hi, the problem was solved by itself when I used the release of Aimet with python 3.8, it was really some days after I posted the question . So essentially i didn't had to compile it myself

quic-ssiddego assigned quic-akhobare Aug 13, 2020

quic-ssiddego closed this as completed Nov 6, 2020

xing-shuai mentioned this issue Apr 21, 2021

Does AIMET QuantSlim calculate quantization encodings the same way as SNPE does? #570

Closed

cascosula mentioned this issue Mar 1, 2022

AIMET and YOLOv5 #1067

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use aimet sdk to deploy quantized model in Qualcomm Neural Processing SDK #168

How to use aimet sdk to deploy quantized model in Qualcomm Neural Processing SDK #168

HaihuaQiu commented Aug 7, 2020 •

edited

Loading

quic-akhobare commented Oct 22, 2020

HaihuaQiu commented Oct 23, 2020

quic-akhobare commented Oct 23, 2020

HaihuaQiu commented Oct 26, 2020

quic-ssiddego commented Nov 6, 2020

zhuzm0902 commented Dec 9, 2020

quic-akhobare commented Dec 10, 2020

ibei commented Dec 21, 2020

ithmz commented Sep 16, 2021

quic-ssiddego commented Nov 30, 2021 •

edited

Loading

bomerzz commented Jan 6, 2022

lblbk commented Jan 10, 2022

hasuoshenyun commented Feb 7, 2022 •

edited

Loading

quic-akhobare commented Feb 7, 2022

hasuoshenyun commented Feb 8, 2022

hasuoshenyun commented Feb 8, 2022

xmfbit commented Apr 16, 2022

benv2k6 commented Jul 3, 2022

quic-akhobare commented Jul 5, 2022

benv2k6 commented Jul 6, 2022 •

edited

Loading

gd2016229035 commented Dec 30, 2022 •

edited

Loading

benv2k6 commented Dec 31, 2022 via email

WithFoxSquirrel commented Jun 28, 2023 •

edited

Loading

benv2k6 commented Jun 28, 2023

How to use aimet sdk to deploy quantized model in Qualcomm Neural Processing SDK #168

How to use aimet sdk to deploy quantized model in Qualcomm Neural Processing SDK #168

Comments

HaihuaQiu commented Aug 7, 2020 • edited Loading

quic-akhobare commented Oct 22, 2020

HaihuaQiu commented Oct 23, 2020

quic-akhobare commented Oct 23, 2020

HaihuaQiu commented Oct 26, 2020

quic-ssiddego commented Nov 6, 2020

zhuzm0902 commented Dec 9, 2020

quic-akhobare commented Dec 10, 2020

ibei commented Dec 21, 2020

ithmz commented Sep 16, 2021

quic-ssiddego commented Nov 30, 2021 • edited Loading

bomerzz commented Jan 6, 2022

lblbk commented Jan 10, 2022

hasuoshenyun commented Feb 7, 2022 • edited Loading

quic-akhobare commented Feb 7, 2022

hasuoshenyun commented Feb 8, 2022

hasuoshenyun commented Feb 8, 2022

xmfbit commented Apr 16, 2022

benv2k6 commented Jul 3, 2022

quic-akhobare commented Jul 5, 2022

benv2k6 commented Jul 6, 2022 • edited Loading

gd2016229035 commented Dec 30, 2022 • edited Loading

benv2k6 commented Dec 31, 2022 via email

WithFoxSquirrel commented Jun 28, 2023 • edited Loading

benv2k6 commented Jun 28, 2023

HaihuaQiu commented Aug 7, 2020 •

edited

Loading

quic-ssiddego commented Nov 30, 2021 •

edited

Loading

hasuoshenyun commented Feb 7, 2022 •

edited

Loading

benv2k6 commented Jul 6, 2022 •

edited

Loading

gd2016229035 commented Dec 30, 2022 •

edited

Loading

WithFoxSquirrel commented Jun 28, 2023 •

edited

Loading