Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure of reproducing sparsification of MobileBERT-oBERT-SQuAD #1534

Closed
absol13 opened this issue Apr 24, 2023 · 4 comments · Fixed by #1539
Closed

Failure of reproducing sparsification of MobileBERT-oBERT-SQuAD #1534

absol13 opened this issue Apr 24, 2023 · 4 comments · Fixed by #1539
Assignees
Labels
bug Something isn't working

Comments

@absol13
Copy link

absol13 commented Apr 24, 2023

Describe the bug
I am trying to reproduce sparsification process of MobileBERT-oBERT-SQuAD model. (zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/14layer_pruned50_quant-none-vnni)
I trained the base model (zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/base-none) with the recipe of 32-epochs as from sparseZoo. But the process of converting sparsified model is missing in the model description, so I used sparseml.transformers.export_onnx tool to convert it to onnx format.
However, our new sparsified model is much slower than the released model in onnx format. In our environment, inference speed of our model is about 7 times slower than the original model. What makes me more confusing was the throughput of the model converted from the uploaded checkpoint of MobileBERT-oBERT-SQuAD was also slower than the released model.
Also, I discovered that the analysis result of our new model from deepsparse.analyze tool is different from that of the released model.
The analysis result of the released model describes the detailed analysis on the wall-time of each layer, but that of our model does not show any model structure and following analysis.
I wonder if there are some missing links in converting sparsity-aware trained model to onnx format.

Also, I discovered another issue complaining about similar case as mine: #1364 .

Expected behavior
The 14layer_pruned50_quant-none-vnni model converted by sparseml.transformers.export_onnx from the officially released checkpoint in sparseZoo should show similar throughput to the released model in onnx format.

Environment
Include all relevant environment information:

  1. OS: Ubuntu 20.04
  2. Python version: 3.8
  3. SparseML version or commit hash: 1.4.4 (official latest)
  4. ML framework version(s): torch 1.12.0+cu113
  5. Other Python package versions: onnx 1.12.0, DeepSparse 1.4.2
  6. Other relevant environment information [e.g. hardware, CUDA version]: CUDA 11.3
  • DeepSparse test environment: Ubuntu 18.04, Intel(R) Xeon(R) Gold 6242 CPU (Other packages have same version as above)

To Reproduce
sparseml.transformers.export_onnx --task question-answering --model_path ./ --sequence_length 384
(You may reproduce using the checkpoint at zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/14layer_pruned50_quant-none-vnni .)

Errors

Analysis result of the newly converted model

Name                        | OutDims                    | KerDims                    | Strides      | ActSpars | Time(ms) |  Util(%) | TFLOPS   | Canonical Name
Naive Subgraph 0            | [-1, -1, -1, -1, -1]       | []                         | [-1, -1, -1] |   0.0000 | 2782.3520 |   0.0000 |   0.0000 | <none>
Total Time(MS): 2782.352000
Items per second: 5.750530
Batch Size: 16
Number of threads: 1

Analysis result of the officially released model by deepsparse.analyze

...
pyramid_247                 | [16, 384, 1, 1, 1]         | []                         | [1, 1, 1]    |   0.0000 |   0.0400 |   0.0000 |   0.0000 |
  sub_pyramid               | []                         | []                         | []           |   0.0000 |   0.0000 |   0.0000 |   0.0000 |
  shuffle                   | [16, 384, 1, 1, 2]         | []                         | []           |   0.0000 |   0.0270 |      nan |      nan | <none>
  shuffle                   | [16, 384, 1, 1, 1]         | []                         | []           |   0.0000 |   0.0050 |      nan |      nan | <none>
  shuffle                   | [16, 384, 1, 1, 1]         | []                         | []           |   0.0000 |   0.0040 |      nan |      nan | <none>
Naive Subgraph 1            | [-1, -1, -1, -1, -1]       | []                         | [-1, -1, -1] |   0.0000 |   0.1200 |   0.0000 |   0.0000 | <none>
Total Time(MS): 1677.556000
Items per second: 9.537685
Batch Size: 16
Number of threads: 1

== Layer Breakdown ==
Name                           | Summed Time | Percent Taken
Naive Subgraph 0               |    4.384    | 0.26%
shuffle                        |  296.853    | 17.92%
gemm                           |  791.355    | 47.76%
  kernel=[384, 512, 1, 1, 1]   |   24.326    | 1.47%
  kernel=[512, 128, 1, 1, 1]   |  258.028    | 15.57%
  kernel=[128, 512, 1, 1, 1]   |  166.731    | 10.06%
  kernel=[128, 128, 1, 1, 1]   |   70.079    | 4.23%
  kernel=[512, 2, 1, 1, 1]     |    1.248    | 0.08%
elementwise                    |   66.248    | 4.00%
ks_gemm                        |  375.736    | 22.68%
  kernel=[128, 512, 1, 1, 1]   |  355.582    | 21.46%
  kernel=[128, 128, 1, 1, 1]   |    5.504    | 0.33%
  kernel=[512, 128, 1, 1, 1]   |   14.650    | 0.88%
softmax                        |  122.250    | 7.38%
Naive Subgraph 1               |    0.120    | 0.01%
== Summed Total Time: 1656.9460 ms
== Items per second: 9.6563

Additional context
Add any other context about the problem here. Also include any relevant files.

@absol13 absol13 added the bug Something isn't working label Apr 24, 2023
@mgoin
Copy link
Member

mgoin commented Apr 24, 2023

Hi @absol13 is there a warning about dynamic shape? It looks like no operations are being run in the engine. Please run the deepsparse benchmarking with a static shape for this testing, such as deepsparse.benchmark model.onnx --input_shapes "[1,384]"

@absol13
Copy link
Author

absol13 commented Apr 25, 2023

Hello. Thanks for your fast response, but it does not seem to be a problem of dynamic shape input. I add another weird benchmark result of the model converted by sparseml.transformers.export_onnx:

2023-04-25 02:22:35 deepsparse.benchmark.benchmark_model INFO     deepsparse.engine.Engine:
        onnx_file_path: model.onnx
        batch_size: 16
        num_cores: 1
        num_streams: 1
        scheduler: Scheduler.default
        fraction_of_supported_ops: 0.0
        cpu_avx_type: avx2
        cpu_vnni: False

As displayed above, fraction_of_supported_ops has value of 0.0 which is contrast with its value close to 1 in the official released model. It seems that deepsparse engine cannot comprehend models generated by sparseml.transformers.export_onnx.

@mgoin
Copy link
Member

mgoin commented Apr 25, 2023

Thanks for the additional detail @absol13 ! I was able to replicate and isolate the issue in the ONNX post-processing of the export ONNX process. We are working on a fix.

In this image, the ONNX from the SparseZoo is on the left and the disfunctional exported model is on the right. There is a single MatMul that wasn't folded properly in the ONNX export.
Screenshot 2023-04-25 at 1 38 16 PM

@mgoin mgoin linked a pull request Apr 26, 2023 that will close this issue
@absol13
Copy link
Author

absol13 commented Apr 27, 2023

Thanks for your fast support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants