int4: disable const_folding for unpack_int4 #3322

lakhinderwalia · 2024-07-30T19:19:20Z

disable const_folding for unpack_int4

src/quantize_int4.cpp

src/driver/main.cpp

src/propagate_constant.cpp

src/quantize_int4.cpp

codecov · 2024-08-27T00:11:02Z

Codecov Report

Attention: Patch coverage is 86.07595% with 11 lines in your changes missing coverage. Please review.

Project coverage is 92.02%. Comparing base (1cd2854) to head (55e0f05).
Report is 1 commits behind head on develop.

Files with missing lines	Patch %	Lines
src/quantize_int4.cpp	91.66%	4 Missing ⚠️
src/shape.cpp	0.00%	4 Missing ⚠️
src/quantization.cpp	0.00%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3322      +/-   ##
===========================================
- Coverage    92.04%   92.02%   -0.02%     
===========================================
  Files          506      508       +2     
  Lines        20872    20948      +76     
===========================================
+ Hits         19212    19278      +66     
- Misses        1660     1670      +10

Flag	Coverage Δ
	`92.02% <86.07%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lakhinderwalia · 2024-09-10T21:31:59Z

@pfultz2, @CharlieL7, this PR is being moved out of the draft mode. Some of its necessary functionality is in other PRs. For example, block_quantization support. Also int4 (i.e. signed support) for the basic pack/unpack. Parse int4 graph.
At this stage, we are able to compile the test graphs that are checked-in, and also other full quark-generated graph. We also need some of these other PRs to merge first. Some other tests will be added as the review progresses. Thanks.

src/propagate_constant.cpp

src/simplify_qdq.cpp

src/targets/gpu/mlir.cpp

src/targets/gpu/target.cpp

src/include/migraphx/quantization.hpp

src/targets/gpu/target.cpp

test/gpu/mlir.cpp

pfultz2

This is still missing unit test for simplify_qdq. It really should be addressed either before merging or after merging.

lakhinderwalia · 2024-09-19T02:39:00Z

This is still missing unit test for simplify_qdq. It really should be addressed either before merging or after merging.

Added.

test/int4_test.cpp

pfultz2 · 2024-09-20T15:32:19Z

There is an mlir failure on jenkins:

[2024-09-19T21:19:12.937Z] [   RUN    ] int4_unpack_conv

[2024-09-19T21:19:12.937Z] terminate called after throwing an instance of 'migraphx::version_1::exception'

[2024-09-19T21:19:12.937Z]   what():  /home/jenkins/workspace/AMDMIGraphX_PR-3322/src/targets/gpu/mlir.cpp:787: run_backend_pipeline: MLIR backend compilation failed: Error: The size of rock.alloc should be greather than zero.

[2024-09-19T21:19:12.937Z] Note: see current operation: %43 = "rock.alloc"() : () -> memref<16xi4, #gpu.address_space<private>>

Do you see this same error locally?

lakhinderwalia · 2024-09-20T15:34:51Z

There is an mlir failure on jenkins:

[2024-09-19T21:19:12.937Z] [   RUN    ] int4_unpack_conv

[2024-09-19T21:19:12.937Z] terminate called after throwing an instance of 'migraphx::version_1::exception'

[2024-09-19T21:19:12.937Z]   what():  /home/jenkins/workspace/AMDMIGraphX_PR-3322/src/targets/gpu/mlir.cpp:787: run_backend_pipeline: MLIR backend compilation failed: Error: The size of rock.alloc should be greather than zero.

[2024-09-19T21:19:12.937Z] Note: see current operation: %43 = "rock.alloc"() : () -> memref<16xi4, #gpu.address_space<private>>

Do you see this same error locally?

No.

pfultz2 · 2024-09-23T16:55:45Z

If this needs a change from mlir to work, then just comment out the broken test for now so we can merge this in.

lakhinderwalia · 2024-09-23T17:18:24Z

Thanks. Actually, MLIR has a fix just commited-in. But I can disable the test. However, we do need one more approval :-)

causten · 2024-09-23T18:21:14Z

I've got #3467 which is pulling from mlir tip. I"m planning to merge it asap

lakhinderwalia · 2024-09-23T18:25:26Z

I've got #3467 which is pulling from mlir tip. I"m planning to merge it asap

@causten, please include 1b99123a4e86629b07b1c2815668f6222d093d70 or newer. Thanks.

migraphx-bot · 2024-09-25T07:28:09Z

Test	Batch	Rate new 151126	Rate old e230c0	Diff	Compare
torchvision-resnet50	64	3,257.30	3,249.77	0.23%	✅
torchvision-resnet50_fp16	64	6,992.53	6,987.71	0.07%	✅
torchvision-densenet121	32	2,434.16	2,431.48	0.11%	✅
torchvision-densenet121_fp16	32	4,083.73	4,103.92	-0.49%	✅
torchvision-inceptionv3	32	1,635.86	1,637.67	-0.11%	✅
torchvision-inceptionv3_fp16	32	2,747.11	2,744.19	0.11%	✅
cadene-inceptionv4	16	779.17	779.19	-0.00%	✅
cadene-resnext64x4	16	808.07	808.74	-0.08%	✅
slim-mobilenet	64	7,457.20	7,462.54	-0.07%	✅
slim-nasnetalarge	64	208.33	208.50	-0.08%	✅
slim-resnet50v2	64	3,441.47	3,435.17	0.18%	✅
bert-mrpc-onnx	8	1,151.63	1,150.08	0.14%	✅
bert-mrpc-tf	1	322.39	314.23	2.60%	✅
pytorch-examples-wlang-gru	1	426.46	420.51	1.42%	✅
pytorch-examples-wlang-lstm	1	377.66	495.59	-23.79%	🔴
torchvision-resnet50_1	1	813.07	770.67	5.50%	🔆
cadene-dpn92_1	1	398.84	402.30	-0.86%	✅
cadene-resnext101_1	1	380.22	381.59	-0.36%	✅
onnx-taau-downsample	1	344.47	343.63	0.24%	✅
dlrm-criteoterabyte	1	35.03	35.05	-0.06%	✅
dlrm-criteoterabyte_fp16	1	58.16	58.08	0.14%	✅
agentmodel	1	7,894.72	8,076.83	-2.25%	✅
unet_fp16	2	57.99	57.92	0.11%	✅
resnet50v1_fp16	1	1,006.42	935.45	7.59%	🔆
resnet50v1_int8	1	975.69	956.44	2.01%	✅
bert_base_cased_fp16	64	1,172.53	1,153.21	1.68%	✅
bert_large_uncased_fp16	32	362.79	355.68	2.00%	✅
bert_large_fp16	1	210.74	211.87	-0.53%	✅
distilgpt2_fp16	16	2,205.73	2,159.18	2.16%	✅
yolov5s	1	531.90	533.70	-0.34%	✅
tinyllama	1	43.38	43.69	-0.70%	✅
vicuna-fastchat	1	168.05	172.04	-2.32%	✅
whisper-tiny-encoder	1	417.71	417.90	-0.05%	✅
whisper-tiny-decoder	1	424.38	424.90	-0.12%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2024-09-25T07:28:11Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

lakhinderwalia self-assigned this Jul 30, 2024

lakhinderwalia linked an issue Jul 30, 2024 that may be closed by this pull request

Disable Constant folding on UnpackInt4 to avoid undoing compression. #3323

Closed

umangyadav reviewed Jul 30, 2024

View reviewed changes

src/quantize_int4.cpp Outdated Show resolved Hide resolved

umangyadav reviewed Jul 30, 2024

View reviewed changes

src/driver/main.cpp Outdated Show resolved Hide resolved

umangyadav reviewed Jul 30, 2024

View reviewed changes

src/propagate_constant.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Aug 18, 2024

View reviewed changes

src/quantize_int4.cpp Show resolved Hide resolved

lakhinderwalia force-pushed the lw/int4 branch 2 times, most recently from c9ba87b to 24419a6 Compare August 26, 2024 22:22

lakhinderwalia force-pushed the lw/int4 branch 3 times, most recently from 9c944af to 8c3147d Compare September 5, 2024 05:15

lakhinderwalia force-pushed the lw/int4 branch from 8c3147d to d2d6bc0 Compare September 10, 2024 21:13

lakhinderwalia requested a review from CharlieL7 September 10, 2024 21:18

lakhinderwalia marked this pull request as ready for review September 10, 2024 21:21

lakhinderwalia requested a review from causten as a code owner September 10, 2024 21:21

pfultz2 reviewed Sep 13, 2024

View reviewed changes

src/propagate_constant.cpp Outdated Show resolved Hide resolved

pfultz2 requested changes Sep 13, 2024

View reviewed changes

lakhinderwalia requested review from pfultz2 and umangyadav September 16, 2024 20:09

lakhinderwalia force-pushed the lw/int4 branch from ca640d7 to 9774153 Compare September 17, 2024 20:51

pfultz2 reviewed Sep 17, 2024

View reviewed changes

src/targets/gpu/target.cpp Outdated Show resolved Hide resolved

pfultz2 reviewed Sep 17, 2024

View reviewed changes

test/gpu/mlir.cpp Show resolved Hide resolved

lakhinderwalia added 3 commits September 18, 2024 06:30

initial int4 changes

3ab317d

Limit PR scope: handle only migraphx based int4 weight quantization

28612c4

handle review comments

7c643e6

lakhinderwalia force-pushed the lw/int4 branch from 9774153 to 7c643e6 Compare September 18, 2024 13:31

lakhinderwalia requested a review from pfultz2 September 18, 2024 13:35

build failure

f772bd9

pfultz2 approved these changes Sep 18, 2024

View reviewed changes

int4 unit-test for simply_qdq

651567c

pfultz2 reviewed Sep 20, 2024

View reviewed changes

test/int4_test.cpp Show resolved Hide resolved

lakhinderwalia requested a review from shivadbhavsar September 24, 2024 00:07

shivadbhavsar approved these changes Sep 24, 2024

View reviewed changes

CharlieL7 removed request for CharlieL7 and umangyadav September 25, 2024 00:16

Merge branch 'develop' into lw/int4

151126f

pfultz2 and others added 3 commits September 25, 2024 12:26

Merge branch 'develop' into lw/int4

b711a51

fix mlir-debug test build due to attribute mismatched string

9c89e91

Merge branch 'develop' into lw/int4

55e0f05

causten merged commit b57b1e4 into develop Sep 26, 2024
42 of 48 checks passed

causten deleted the lw/int4 branch September 26, 2024 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int4: disable const_folding for unpack_int4 #3322

int4: disable const_folding for unpack_int4 #3322

lakhinderwalia commented Jul 30, 2024

codecov bot commented Aug 27, 2024 •

edited

Loading

lakhinderwalia commented Sep 10, 2024

pfultz2 left a comment

lakhinderwalia commented Sep 19, 2024

pfultz2 commented Sep 20, 2024

lakhinderwalia commented Sep 20, 2024

pfultz2 commented Sep 23, 2024

lakhinderwalia commented Sep 23, 2024

causten commented Sep 23, 2024

lakhinderwalia commented Sep 23, 2024

migraphx-bot commented Sep 25, 2024

migraphx-bot commented Sep 25, 2024

int4: disable const_folding for unpack_int4 #3322

int4: disable const_folding for unpack_int4 #3322

Conversation

lakhinderwalia commented Jul 30, 2024

codecov bot commented Aug 27, 2024 • edited Loading

Codecov Report

lakhinderwalia commented Sep 10, 2024

pfultz2 left a comment

Choose a reason for hiding this comment

lakhinderwalia commented Sep 19, 2024

pfultz2 commented Sep 20, 2024

lakhinderwalia commented Sep 20, 2024

pfultz2 commented Sep 23, 2024

lakhinderwalia commented Sep 23, 2024

causten commented Sep 23, 2024

lakhinderwalia commented Sep 23, 2024

migraphx-bot commented Sep 25, 2024

migraphx-bot commented Sep 25, 2024

codecov bot commented Aug 27, 2024 •

edited

Loading