Support block granularity for QuantizeLinear and DequantizeLinear #3412

music-dino · 2024-09-04T09:28:06Z

Add support for block level granularity in QuantizeLinear and DequantizeLinear.

y_scale and y_zero point are transformed to match the shape of x by applying a unsqueeze->broadcast->reshape chain of transformations.
If the final block of x is smaller than given block_size the transformed y_scale and y_zero_point are sliced to remove excess elements.

Resolves migraphx-benchmark#192

…tions

codecov · 2024-09-04T10:54:45Z

Codecov Report

Attention: Patch coverage is 96.59091% with 3 lines in your changes missing coverage. Please review.

Project coverage is 92.02%. Comparing base (e4eb481) to head (f0c12a4).
Report is 1 commits behind head on develop.

Files with missing lines	Patch %	Lines
src/onnx/parse_quantizelinear.cpp	88.88%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3412      +/-   ##
===========================================
- Coverage    92.02%   92.02%   -0.01%     
===========================================
  Files          508      509       +1     
  Lines        20948    21005      +57     
===========================================
+ Hits         19278    19330      +52     
- Misses        1670     1675       +5

Flag	Coverage Δ
	`92.02% <96.59%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lakhinderwalia · 2024-09-04T17:25:28Z

src/onnx/parse_quantizelinear.cpp


-            common_args.push_back(y_zero_point);
+        if(parser.opset_version < 19)
+        {


There are only two types supported for T1 before version 19. I appreciate your thoroughness in following up those details. But it isn't clear that this operator should then support input type x as either float or int32. And later version should additionally support bfloat16, float16. Thanks.

I thought about adding these as well, but decided against mainly because I haven't noticed that it's common practice to have the type constraints checked in parser code, although I might be wrong here.

lakhinderwalia · 2024-09-04T17:30:23Z

src/onnx/parse_quantizelinear.cpp

+
+        // Starting with version 19 ONNX introduced the constraint that x and y_scale types must be
+        // the same
+        if(parser.opset_version >= 19 and


As a matter of general approach, if common_type (below) can be safely derived even for version prior to 19, is it okay to not flag errors for type mismatch -- i.e. by looking at Opset version? This is just for my understanding -- I am not suggesting a code change here. Thanks.

We have to flag it because the onnx spec states that it's a constraint for versions 19 and up.
The common type derivation and conversion could be done for all cases, without a version check condition, but It'd be doing the extra work of common type calculation and looping over the arguments for opset versions 19+ to no avail, since we already know that the types are the same for that case.

lakhinderwalia · 2024-09-04T23:20:45Z

src/onnx/quantize_dequantize_linear.cpp

+    else
+    {
+        axis = tune_axis(x_rank, axis, op_name);
+        if(block_size == 0)


Our quark generated graph doesn't use an explicit block_size. So this assumption about its being 0 needs to be tweaked a little bit. I think this parameter is optional. So we should work well in case it isn't supplied -- and not assume it is 0 then, and its final value should be computed to be = block_size_min. OTOH, if block_size is supplied with a model, and it isn't a 0, then we should compare it within the lower and upper bounds. Thanks.

Therefore, please remove this exception clause.

The ONNX spec states it is an optional attribute, with a default value of 0:
https://onnx.ai/onnx/operators/onnx__QuantizeLinear.html#attributes

We don't have the quark-generated graph compiling with your current code. I can change this code later. Thanks.

lakhinderwalia · 2024-09-04T23:23:49Z

src/onnx/quantize_dequantize_linear.cpp

+        // axis=i, the accepted range is [ceil(Di/Si), ceil(Di/(Si-1))-1]
+        float di           = x_lens[axis];
+        float si           = y_scale_lens[axis];
+        int block_size_min = std::ceil(di / si);


Sample code that can be added below if exception above is removed -- for block_size == 0.

if(block_size ==0) block_size = block_size_min;

…locked

migraphx-bot · 2024-09-18T09:10:52Z

Test	Batch	Rate new 73178d	Rate old 7c2fdf	Diff	Compare
torchvision-resnet50	64	3,250.28	3,249.19	0.03%	✅
torchvision-resnet50_fp16	64	6,993.72	6,993.27	0.01%	✅
torchvision-densenet121	32	2,434.70	2,434.31	0.02%	✅
torchvision-densenet121_fp16	32	4,064.64	4,095.02	-0.74%	✅
torchvision-inceptionv3	32	1,635.58	1,635.79	-0.01%	✅
torchvision-inceptionv3_fp16	32	2,739.23	2,740.83	-0.06%	✅
cadene-inceptionv4	16	776.31	776.76	-0.06%	✅
cadene-resnext64x4	16	808.33	808.72	-0.05%	✅
slim-mobilenet	64	7,455.05	7,455.28	-0.00%	✅
slim-nasnetalarge	64	208.24	208.38	-0.07%	✅
slim-resnet50v2	64	3,433.61	3,435.08	-0.04%	✅
bert-mrpc-onnx	8	1,150.40	1,150.34	0.01%	✅
bert-mrpc-tf	1	312.56	314.36	-0.57%	✅
pytorch-examples-wlang-gru	1	418.10	418.46	-0.08%	✅
pytorch-examples-wlang-lstm	1	382.21	499.68	-23.51%	🔴
torchvision-resnet50_1	1	780.24	772.72	0.97%	✅
cadene-dpn92_1	1	437.74	397.74	10.06%	🔆
cadene-resnext101_1	1	381.54	383.61	-0.54%	✅
onnx-taau-downsample	1	344.56	344.76	-0.06%	✅
dlrm-criteoterabyte	1	35.05	35.10	-0.15%	✅
dlrm-criteoterabyte_fp16	1	58.13	58.12	0.01%	✅
agentmodel	1	8,198.74	7,932.67	3.35%	🔆
unet_fp16	2	58.11	57.85	0.44%	✅
resnet50v1_fp16	1	941.69	935.68	0.64%	✅
resnet50v1_int8	1	934.18	949.99	-1.66%	✅
bert_base_cased_fp16	64	1,153.57	1,153.06	0.04%	✅
bert_large_uncased_fp16	32	355.73	355.77	-0.01%	✅
bert_large_fp16	1	211.68	210.32	0.64%	✅
distilgpt2_fp16	16	2,160.36	2,161.65	-0.06%	✅
yolov5s	1	537.57	534.27	0.62%	✅
tinyllama	1	43.41	43.40	0.03%	✅
vicuna-fastchat	1	176.57	170.43	3.60%	🔆
whisper-tiny-encoder	1	418.00	418.17	-0.04%	✅
whisper-tiny-decoder	1	433.60	426.09	1.76%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2024-09-18T09:10:53Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

lakhinderwalia · 2024-09-24T19:17:53Z

src/onnx/parse_dequantizelinear.cpp

            {
-                x_zero_point = info.add_instruction(
-                    make_op("multibroadcast", {{"out_lens", input_lens}}), x_zero_point);
+                MIGRAPHX_THROW("DequantizeLinear: y_scale and y_zero_point shapes must be equal. "


Nit: "DequantizeLinear: y_scale and y_zero_point shape mismatch."

lakhinderwalia

Left you some very minor comments. They are optional.
Approved.

lakhinderwalia · 2024-09-24T19:26:52Z

src/onnx/parse_quantizelinear.cpp

+        if(parser.opset_version < 19)
+        {
+            auto common_type = common_shape({args[0]->get_shape(), args[1]->get_shape()}).type();
+            std::transform(args.begin(), args.begin() + 2, args.begin(), [&](auto ins) {


Just trying to understand here: Why is it args.begin() + 2. And not args.end(). Thanks.

Prior to version 19, the first two inputs(x and y_scales) can have different float types, so a conversion to common type is needed to make the mgx operator work. The optional third input will have a type of int8 or uint8, and we want to leave it that way.

lakhinderwalia · 2024-09-24T19:36:31Z

src/onnx/parse_quantizelinear.cpp

-        auto common_args = add_common_args(*info.mod, {args[0], y_scale});
-
-        if(args.size() == 3)
+        if(output_type.has_value() and args.size() == 3 and


Style: Please do the exception processing in one clause, on line 59 above.

CharlieL7

Please add some onnx parse tests that show the expected MIGraphX IR output from these changes.

src/onnx/quantize_dequantize_linear.cpp

…cked

music-dino · 2024-09-27T09:35:30Z

Please add some onnx parse tests that show the expected MIGraphX IR output from these changes.

I've added a couple of parse tests.

…_type attributes

music-dino added 3 commits September 3, 2024 14:19

Quantize linear block granularity implementation, refactoring

3a85f7c

Implement blocked granularity for DequantizeLinear

0807594

Add tests for blocked dequantize, add error mesages for failure condi…

cd15870

…tions

music-dino added the Onnx Operators Adding or modifying an Onnx Operator in the MIGraphX codebase label Sep 4, 2024

music-dino requested a review from causten as a code owner September 4, 2024 09:28

pfultz2 mentioned this pull request Sep 4, 2024

Onnx parsers for Quantization & Dequantization: case when scales and zero point tensors of same dimension as the input tensor #3385

Open

causten requested review from pfultz2, lakhinderwalia and TedThemistokleous September 4, 2024 13:54

lakhinderwalia reviewed Sep 4, 2024

View reviewed changes

causten requested a review from CharlieL7 September 9, 2024 16:58

lakhinderwalia mentioned this pull request Sep 10, 2024

int4: disable const_folding for unpack_int4 #3322

Merged

Merge branch 'develop' into quantizelinear_blocked

c3d6f67

causten added the high priority A PR with high priority for review and merging. label Sep 11, 2024

music-dino added 3 commits September 17, 2024 19:06

Implement negative parse tests for (de)quantizelinear

0d9e209

Merge remote-tracking branch 'upstream/develop' into quantizelinear_b…

8df2097

…locked

Fix formatting

73178d7

music-dino requested a review from lakhinderwalia September 18, 2024 06:03

lakhinderwalia reviewed Sep 24, 2024

View reviewed changes

lakhinderwalia approved these changes Sep 24, 2024

View reviewed changes

CharlieL7 requested changes Sep 25, 2024

View reviewed changes

src/onnx/quantize_dequantize_linear.cpp Outdated Show resolved Hide resolved

music-dino added 2 commits September 27, 2024 08:55

Implement onnx parse tests, address review comments

39b628d

Merge remote-tracking branch 'origin/develop' into quantizelinear_blo…

cf36e5c

…cked

music-dino requested a review from CharlieL7 September 27, 2024 09:34

Update onnx_operators.rst to relect support for block_size and output…

7d6b9b7

…_type attributes

music-dino requested a review from a team as a code owner September 27, 2024 13:41

CharlieL7 approved these changes Sep 27, 2024

View reviewed changes

Merge branch 'develop' into quantizelinear_blocked

f0c12a4

causten merged commit 74bc6be into develop Sep 28, 2024
47 of 48 checks passed

causten deleted the quantizelinear_blocked branch September 28, 2024 04:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support block granularity for QuantizeLinear and DequantizeLinear #3412

Support block granularity for QuantizeLinear and DequantizeLinear #3412

music-dino commented Sep 4, 2024

codecov bot commented Sep 4, 2024 •

edited

Loading

lakhinderwalia Sep 4, 2024

music-dino Sep 9, 2024

lakhinderwalia Sep 4, 2024

music-dino Sep 9, 2024

lakhinderwalia Sep 4, 2024

lakhinderwalia Sep 4, 2024

music-dino Sep 9, 2024

lakhinderwalia Sep 10, 2024

lakhinderwalia Sep 4, 2024

migraphx-bot commented Sep 18, 2024

migraphx-bot commented Sep 18, 2024

lakhinderwalia Sep 24, 2024 •

edited

Loading

lakhinderwalia left a comment

lakhinderwalia Sep 24, 2024

music-dino Sep 27, 2024

lakhinderwalia Sep 24, 2024

CharlieL7 left a comment

music-dino commented Sep 27, 2024

Support block granularity for QuantizeLinear and DequantizeLinear #3412

Support block granularity for QuantizeLinear and DequantizeLinear #3412

Conversation

music-dino commented Sep 4, 2024

codecov bot commented Sep 4, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

migraphx-bot commented Sep 18, 2024

migraphx-bot commented Sep 18, 2024

lakhinderwalia Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

lakhinderwalia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CharlieL7 left a comment

Choose a reason for hiding this comment

music-dino commented Sep 27, 2024

codecov bot commented Sep 4, 2024 •

edited

Loading

lakhinderwalia Sep 24, 2024 •

edited

Loading