Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow SegmentationModel support #9472

Merged
merged 9 commits into from
Sep 18, 2022
Merged

TensorFlow SegmentationModel support #9472

merged 9 commits into from
Sep 18, 2022

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Sep 18, 2022

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Improvements in TensorFlow model export and inference functionalities in the YOLOv5 repository.

πŸ“Š Key Changes

  • πŸ› οΈ Modified ci-testing.yml to add a --hard-fail threshold for segmentation model benchmarking.
  • πŸ€– Updated export.py to handle TensorFlow SavedModel outputs more robustly and support models with variable output structures.
  • 🧠 Refined common.py and tf.py inference code to improve the processing of TensorFlow model outputs and adapt to different types of model architectures.

🎯 Purpose & Impact

  • 🏁 The addition of --hard-fail in the benchmark testing ensures that the segmentation models meet a minimum performance threshold, enhancing quality control.
  • πŸ”„ The revised TensorFlow export and inference logic creates greater flexibility when working with different Output-Head structures and models, facilitating developers' and researchers’ ability to export and run various TensorFlow formats more reliably.
  • πŸ“ˆ These changes should benefit users by improving the stability and versatility of YOLOv5's TensorFlow export and inference capabilities, ultimately making it easier and more efficient to deploy the models across diverse platforms.

@glenn-jocher glenn-jocher self-assigned this Sep 18, 2022
@glenn-jocher
Copy link
Member Author

@zldrobit I'm working on adding TF support for Segmentation models, and I had a question. In TFDetect we normalize the bounding box coordinates here:

yolov5/models/tf.py

Lines 308 to 309 in 92b5242

xy /= tf.constant([[self.imgsz[1], self.imgsz[0]]], dtype=tf.float32)
wh /= tf.constant([[self.imgsz[1], self.imgsz[0]]], dtype=tf.float32)

And then we have to denormalize them here during inference. This is a problem now because I want to use DetectMultiBackend for ClassificationModel and SegmentationModel support in addition to DetectionModels, so the denormalization op will hurt ClassificationModels.

y[..., :4] *= [w, h, w, h] # xywh normalized to pixels

Can we remove the normalize-denormalize op? Is it only there for quantization improvement? Are you sure it's helping the quantization?

@glenn-jocher glenn-jocher merged commit fda8aa5 into master Sep 18, 2022
@glenn-jocher glenn-jocher deleted the tf/update branch September 18, 2022 17:52
@zldrobit
Copy link
Contributor

@glenn-jocher

Can we remove the normalize-denormalize op?

If YOLOv5 has to support detection model export for int8 TFLite models, the normalize/denormalize code has to be kept.

Is it only there for quantization improvement?

Yes. Removing the normalize/denormalize code does not affect export/inference for TFLite models on fp32/fp16 precision.

Are you sure it's helping the quantization?

Yes, I could confirm that. For TF 2.4/2.5/2.6, removing the normalize/denormalize code, the accuracy drops drastically after int8 quantization. I also tested it with TF 2.9.2/2.10.0, int8 TFLite models without the normalization code have zero mAP.

The reason of normalization is that TensorFlow keeps only one set of bias/multiplication factors in (de)quantization for tensor input/output. Thus, all input/output values of a tensor have to be normalized to the same range (e.g. 0-1). YOLOv5 currently concatenates bbox coordinates, bbox confidence and class probability into one tensor by Detect module. Before the concatenation, the bbox coordinates are normalized to 0-1 to reduce the quantization error in int8 TFLite quantization.

@glenn-jocher
Copy link
Member Author

@zldrobit I see. It's difficult then, not sure what to do.

I was looking at this. Are we using per-axis or per-tensor quantization? If we moved to per-axis would this help?
https://www.tensorflow.org/lite/performance/quantization_spec

I've seen even with the current TFLite INT8 method we lose significant mAP vs FP16, which doesn't happen with CoreML. I don't have the validation results handy but I'll re-run them now and post. I think the drop may be about 20%.

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Sep 21, 2022

@zldrobit ok here's my test:

PyTorch

!python val.py --weights yolov5s.pt --batch 1
val: data=data/coco128.yaml, weights=['yolov5s.pt'], batch_size=1, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 πŸš€ v6.2-149-g77dcf55 Python-3.7.14 torch-1.12.1+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
val: Scanning '/content/datasets/coco128/labels/train2017' images and labels...126 found, 2 missing, 0 empty, 0 corrupt: 100% 128/128 [00:00<00:00, 3471.77it/s]
val: New cache created: /content/datasets/coco128/labels/train2017.cache
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 128/128 [00:04<00:00, 29.22it/s]
                   all        128        929      0.699      0.633      0.704      0.473
Speed: 0.2ms pre-process, 10.7ms inference, 1.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/val/exp

CoreML

(venv) glennjocher@Glenns-MacBook-Air yolov5 % python val.py --weights yolov5s-fp16.mlmodel --batch 1
val: data=data/coco128.yaml, weights=['yolov5s-fp16.mlmodel'], batch_size=1, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 πŸš€ v6.2-151-g5ff8c10 Python-3.10.6 torch-1.13.0.dev20220828 CPU

Loading yolov5s-fp16.mlmodel for CoreML inference...
TensorFlow version 2.10.0 has not been tested with coremltools. You may run into unexpected errors. TensorFlow 2.8.0 is the most recent version that has been tested.
Torch version 1.13.0.dev20220828 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models
val: Scanning '/Users/glennjocher/PycharmProjects/datasets/coco128/labels/train2017.cache' images and labels... 126 found, 2 missing, 0 empty, 0 corrupt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 128/128 [00:00<?, ?it/s]     
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 128/128 [00:02<00:00, 48.93it/s]                                                             
                   all        128        929      0.681      0.653      0.711      0.472
Speed: 0.4ms pre-process, 13.8ms inference, 2.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/val/exp6

(venv) glennjocher@Glenns-MacBook-Air yolov5 % python val.py --weights yolov5s-int8.mlmodel --batch 1
val: data=data/coco128.yaml, weights=['yolov5s-int8.mlmodel'], batch_size=1, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 πŸš€ v6.2-151-g5ff8c10 Python-3.10.6 torch-1.13.0.dev20220828 CPU

Loading yolov5s-int8.mlmodel for CoreML inference...
TensorFlow version 2.10.0 has not been tested with coremltools. You may run into unexpected errors. TensorFlow 2.8.0 is the most recent version that has been tested.
Torch version 1.13.0.dev20220828 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models
val: Scanning '/Users/glennjocher/PycharmProjects/datasets/coco128/labels/train2017.cache' images and labels... 126 found, 2 missing, 0 empty, 0 corrupt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 128/128 [00:00<?, ?it/s]     
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 128/128 [00:02<00:00, 49.59it/s]                                                             
                   all        128        929      0.683      0.639      0.702      0.465
Speed: 0.3ms pre-process, 13.6ms inference, 2.4ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/val/exp5

TFLite (TODO)

!git clone https://github.com/ultralytics/yolov5  # clone
%cd yolov5
%pip install -qr requirements.txt  # install

!python export.py --include tflite
!python export.py --include tflite --int8

!python val.py weights yolov5s-fp16.tflite --batch 1
val: data=data/coco128.yaml, weights=['yolov5s-fp16.tflite'], batch_size=1, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 πŸš€ v6.2-149-g77dcf55 Python-3.7.14 torch-1.12.1+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Loading yolov5s-fp16.tflite for TensorFlow Lite inference...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models
val: Scanning '/content/datasets/coco128/labels/train2017.cache' images and labels... 126 found, 2 missing, 0 empty, 0 corrupt: 100% 128/128 [00:00<?, ?it/s]
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 128/128 [00:40<00:00,  3.15it/s]
                   all        128        929       0.68      0.653       0.71      0.471
Speed: 0.4ms pre-process, 303.4ms inference, 1.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/val/exp2
val: data=data/coco128.yaml, weights=['yolov5s-int8.tflite'], batch_size=1, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 πŸš€ v6.2-149-g77dcf55 Python-3.7.14 torch-1.12.1+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

!python val.py weights yolov5s-int8.tflite --batch 1
Loading yolov5s-int8.tflite for TensorFlow Lite inference...
Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models
val: Scanning '/content/datasets/coco128/labels/train2017.cache' images and labels... 126 found, 2 missing, 0 empty, 0 corrupt: 100% 128/128 [00:00<?, ?it/s]
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 128/128 [48:12<00:00, 22.60s/it]
                   all        128        929      0.709       0.58       0.68      0.425
Speed: 0.4ms pre-process, 22581.7ms inference, 1.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/val/exp3

@zldrobit
Copy link
Contributor

@glenn-jocher

I was looking at this. Are we using per-axis or per-tensor quantization? If we moved to per-axis would this help?

TensorFlow may use per-axis quantization for weights, but it uses per-tensor quantization for computation input/output according to https://www.tensorflow.org/lite/performance/quantization_spec. I excerpt the paragraph as follows,

Per-axis (aka per-channel in Conv ops) or per-tensor weights are represented by int8 two’s complement values in the range [-127, 127] with zero-point equal to 0. Per-tensor activations/inputs are represented by int8 two’s complement values in the range [-128, 127], with a zero-point in range [-128, 127].

It's a limitation of TensorFlow, and we could not move the computation input/output to a per-axis manner.

I've seen even with the current TFLite INT8 method we lose significant mAP vs FP16, which doesn't happen with CoreML. I don't have the validation results handy but I'll re-run them now and post. I think the drop may be about 20%.

According to Performance Evaluation of INT8 Quantized Inference on Mobile GPUs,
image
, and https://coremltools.readme.io/docs/quantization
image
.
As I understand, CoreML only supports int8 quantization for weights, while it still calculates in fp32/fp16 precision. Though yolov5s-int8.mlmodel is int8 quantized, the inference is of fp32/fp16 precision. IMHO, yolov5s-int8.mlmodel is expected to have a higher mAP than yolov5s-int8.tflite, considering the inference of yolov5s-int8.tflite is in a lower, i.e. int8, precision.

@glenn-jocher
Copy link
Member Author

@zldrobit understood, thanks for the analysis. I guess I'll leave the normalisation alone for now until I can find a better solution.

For PyTorch model inference we can determine the type of model, i.e. ClassificationModel, SegmentationModel, DetectionModel, and use this to determine if we should de-normalize boxes, but if someone loads an exported TF model I'm not exactly sure how to do that since they are all TFModel types.

Do you know if we can search the model for TFDetect or TFSegment classes that would confirm it's in need of denormalization? If they're missing then it's likely a classification model and we can skip the denormalization.

@zldrobit
Copy link
Contributor

@glenn-jocher The class information (e.g. TFDetect or TFSegment) could be saved in TF SavedModel format, just like a Pytorch model. The TF GraphDef (.pb) format does not hold class information. One can save class information in TFLite models' metadata (https://www.tensorflow.org/lite/models/convert/metadata). I am wondering if using a filename with the -cls suffix is easier for identifying a YOLOv5 classification model.

@glenn-jocher
Copy link
Member Author

@zldrobit That's a great suggestion. Adding a naming convention like using a filename with the -cls suffix to denote a YOLOv5 classification model would indeed make it easier to identify the type of model when loading it. This would simplify the logic for determining whether denormalization is necessary based on the model type.

I'll explore this approach further and see how we can incorporate it into the model loading process.

Thanks for the input!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants