Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantized model INT8 is not able to do inference or convert to any other type of model structure #9979

Closed
1 task done
Sanath1998 opened this issue Oct 30, 2022 · 8 comments
Labels
question Further information is requested Stale

Comments

@Sanath1998
Copy link

Sanath1998 commented Oct 30, 2022

Search before asking

Question

Hi @glenn-jocher

As I have quantized FLOAT32 model to INT8 model, I'am not able to convert the model to any other formats nor able to inference using detect.py and val.py

Error msg:

  1. While doing inference

ckpt = (ckpt.get('ema') or ckpt['model']).to(device).float() # FP32 model
AttributeError: 'collections.OrderedDict' object has no attribute 'to'

  1. While converting to any other model type using export.py

[TRT] [E] ModelImporter.cpp:779: ERROR: images:232 In function importInput:
[8] Assertion failed: convertDtype(onnxDtype.elem_type(), &trtDtype) && "Failed to convert ONNX date type to TensorRT data type."

@glenn-jocher Can you plz look onto this? Looking forward for your reply

Additional

No response

@Sanath1998 Sanath1998 added the question Further information is requested label Oct 30, 2022
@glenn-jocher
Copy link
Member

👋 Hello! Thanks for asking about YOLOv5 🚀 benchmarks. YOLOv5 inference is officially supported in 11 formats, and all formats are benchmarked for identical accuracy and to compare speed every 24 hours by the YOLOv5 CI.

💡 ProTip: Export to ONNX or OpenVINO for up to 3x CPU speedup. See CPU Benchmarks.
💡 ProTip: Export to TensorRT for up to 5x GPU speedup. See GPU Benchmarks.

Format export.py --include Model
PyTorch - yolov5s.pt
TorchScript torchscript yolov5s.torchscript
ONNX onnx yolov5s.onnx
OpenVINO openvino yolov5s_openvino_model/
TensorRT engine yolov5s.engine
CoreML coreml yolov5s.mlmodel
TensorFlow SavedModel saved_model yolov5s_saved_model/
TensorFlow GraphDef pb yolov5s.pb
TensorFlow Lite tflite yolov5s.tflite
TensorFlow Edge TPU edgetpu yolov5s_edgetpu.tflite
TensorFlow.js tfjs yolov5s_web_model/

Benchmarks

Benchmarks below run on a Colab Pro with the YOLOv5 tutorial notebook Open In Colab. To reproduce:

python utils/benchmarks.py --weights yolov5s.pt --imgsz 640 --device 0

Colab Pro V100 GPU

benchmarks: weights=/content/yolov5/yolov5s.pt, imgsz=640, batch_size=1, data=/content/yolov5/data/coco128.yaml, device=0, half=False, test=False
Checking setup...
YOLOv5 🚀 v6.1-135-g7926afc torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)
Setup complete ✅ (8 CPUs, 51.0 GB RAM, 46.7/166.8 GB disk)

Benchmarks complete (458.07s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch        0.4623                10.19
1             TorchScript        0.4623                 6.85
2                    ONNX        0.4623                14.63
3                OpenVINO           NaN                  NaN
4                TensorRT        0.4617                 1.89
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel        0.4623                21.28
7     TensorFlow GraphDef        0.4623                21.22
8         TensorFlow Lite           NaN                  NaN
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

Colab Pro CPU

benchmarks: weights=/content/yolov5/yolov5s.pt, imgsz=640, batch_size=1, data=/content/yolov5/data/coco128.yaml, device=cpu, half=False, test=False
Checking setup...
YOLOv5 🚀 v6.1-135-g7926afc torch 1.10.0+cu111 CPU
Setup complete ✅ (8 CPUs, 51.0 GB RAM, 41.5/166.8 GB disk)

Benchmarks complete (241.20s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch        0.4623               127.61
1             TorchScript        0.4623               131.23
2                    ONNX        0.4623                69.34
3                OpenVINO        0.4623                66.52
4                TensorRT           NaN                  NaN
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel        0.4623               123.79
7     TensorFlow GraphDef        0.4623               121.57
8         TensorFlow Lite        0.4623               316.61
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

Good luck 🍀 and let us know if you have any other questions!

@Sanath1998
Copy link
Author

@glenn-jocher , my intent was to take a call on know-hows of converting model to quantized INT8 precison and doing inference on the INT8 precision mode?
Could you please explain me on this point

@glenn-jocher
Copy link
Member

@Sanath1998 depends on the export format. Some are already doing this, i.e. python export.py --tflite --int8

@Sanath1998
Copy link
Author

@glenn-jocher
Can't we do quantization for other models which is exported into onnx, tensorrt etc?

@glenn-jocher
Copy link
Member

Many formats support FP16 with the --half flag

@Sanath1998
Copy link
Author

Many formats support FP16 with the --half flag

@glenn-jocher actually the models before using --half flag and after using --half flag, the size of the model is same for both.
Actually once we reduce to fp16, the model size should decrease ryt?
Can u plz explain me the context on this

@glenn-jocher
Copy link
Member

glenn-jocher commented Nov 7, 2022

👋 hi, thanks for letting us know about this possible problem with YOLOv5 🚀.

Not all formats support --half and --int8. See export.py code for details.

We've created a few short guidelines below to help users provide what we need in order to start investigating a possible problem.

How to create a Minimal, Reproducible Example

When asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:

  • Minimal – Use as little code as possible to produce the problem
  • Complete – Provide all parts someone else needs to reproduce the problem
  • Reproducible – Test the code you're about to provide to make sure it reproduces the problem

For Ultralytics to provide assistance your code should also be:

  • Current – Verify that your code is up-to-date with GitHub master, and if necessary git pull or git clone a new copy to ensure your problem has not already been solved in master.
  • Unmodified – Your problem must be reproducible using official YOLOv5 code without changes. Ultralytics does not provide support for custom code ⚠️.

If you believe your problem meets all the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem.

Thank you! 😃

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2022

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@github-actions github-actions bot added the Stale label Dec 8, 2022
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

2 participants