-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support block granularity for QuantizeLinear and DequantizeLinear #3412
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3412 +/- ##
===========================================
- Coverage 92.02% 92.02% -0.01%
===========================================
Files 508 509 +1
Lines 20948 21005 +57
===========================================
+ Hits 19278 19330 +52
- Misses 1670 1675 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
||
common_args.push_back(y_zero_point); | ||
if(parser.opset_version < 19) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only two types supported for T1
before version 19. I appreciate your thoroughness in following up those details. But it isn't clear that this operator should then support input type x
as either float
or int32
. And later version should additionally support bfloat16
, float16
. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about adding these as well, but decided against mainly because I haven't noticed that it's common practice to have the type constraints checked in parser code, although I might be wrong here.
|
||
// Starting with version 19 ONNX introduced the constraint that x and y_scale types must be | ||
// the same | ||
if(parser.opset_version >= 19 and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a matter of general approach, if common_type (below) can be safely derived even for version prior to 19, is it okay to not flag errors for type mismatch -- i.e. by looking at Opset version? This is just for my understanding -- I am not suggesting a code change here. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to flag it because the onnx spec states that it's a constraint for versions 19 and up.
The common type derivation and conversion could be done for all cases, without a version check condition, but It'd be doing the extra work of common type calculation and looping over the arguments for opset versions 19+ to no avail, since we already know that the types are the same for that case.
else | ||
{ | ||
axis = tune_axis(x_rank, axis, op_name); | ||
if(block_size == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our quark generated graph doesn't use an explicit block_size
. So this assumption about its being 0
needs to be tweaked a little bit. I think this parameter is optional. So we should work well in case it isn't supplied -- and not assume it is 0
then, and its final value should be computed to be = block_size_min
. OTOH, if block_size
is supplied with a model, and it isn't a 0
, then we should compare it within the lower and upper bounds. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore, please remove this exception clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ONNX spec states it is an optional attribute, with a default value of 0:
https://onnx.ai/onnx/operators/onnx__QuantizeLinear.html#attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have the quark-generated graph compiling with your current code. I can change this code later. Thanks.
// axis=i, the accepted range is [ceil(Di/Si), ceil(Di/(Si-1))-1] | ||
float di = x_lens[axis]; | ||
float si = y_scale_lens[axis]; | ||
int block_size_min = std::ceil(di / si); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sample code that can be added below if exception above is removed -- for block_size == 0
.
if(block_size ==0) block_size = block_size_min;
This build is not recommended to merge 🔴 |
🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output |
src/onnx/parse_dequantizelinear.cpp
Outdated
{ | ||
x_zero_point = info.add_instruction( | ||
make_op("multibroadcast", {{"out_lens", input_lens}}), x_zero_point); | ||
MIGRAPHX_THROW("DequantizeLinear: y_scale and y_zero_point shapes must be equal. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "DequantizeLinear: y_scale and y_zero_point shape mismatch."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left you some very minor comments. They are optional.
Approved.
if(parser.opset_version < 19) | ||
{ | ||
auto common_type = common_shape({args[0]->get_shape(), args[1]->get_shape()}).type(); | ||
std::transform(args.begin(), args.begin() + 2, args.begin(), [&](auto ins) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just trying to understand here: Why is it args.begin() + 2. And not args.end(). Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to version 19, the first two inputs(x and y_scales) can have different float types, so a conversion to common type is needed to make the mgx operator work. The optional third input will have a type of int8 or uint8, and we want to leave it that way.
auto common_args = add_common_args(*info.mod, {args[0], y_scale}); | ||
|
||
if(args.size() == 3) | ||
if(output_type.has_value() and args.size() == 3 and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: Please do the exception processing in one clause, on line 59 above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some onnx parse tests that show the expected MIGraphX IR output from these changes.
I've added a couple of parse tests. |
Add support for block level granularity in QuantizeLinear and DequantizeLinear.
y_scale
andy_zero
point are transformed to match the shape ofx
by applying a unsqueeze->broadcast->reshape chain of transformations.If the final block of
x
is smaller than givenblock_size
the transformedy_scale
andy_zero_point
are sliced to remove excess elements.Resolves migraphx-benchmark#192