TensorFlow SegmentationModel support #9472

glenn-jocher · 2022-09-18T15:34:42Z

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Improvements in TensorFlow model export and inference functionalities in the YOLOv5 repository.

📊 Key Changes

🛠️ Modified ci-testing.yml to add a --hard-fail threshold for segmentation model benchmarking.
🤖 Updated export.py to handle TensorFlow SavedModel outputs more robustly and support models with variable output structures.
🧠 Refined common.py and tf.py inference code to improve the processing of TensorFlow model outputs and adapt to different types of model architectures.

🎯 Purpose & Impact

🏁 The addition of --hard-fail in the benchmark testing ensures that the segmentation models meet a minimum performance threshold, enhancing quality control.
🔄 The revised TensorFlow export and inference logic creates greater flexibility when working with different Output-Head structures and models, facilitating developers' and researchers’ ability to export and run various TensorFlow formats more reliably.
📈 These changes should benefit users by improving the stability and versatility of YOLOv5's TensorFlow export and inference capabilities, ultimately making it easier and more efficient to deploy the models across diverse platforms.

for more information, see https://pre-commit.ci

glenn-jocher · 2022-09-18T16:00:33Z

@zldrobit I'm working on adding TF support for Segmentation models, and I had a question. In TFDetect we normalize the bounding box coordinates here:

yolov5/models/tf.py

Lines 308 to 309 in 92b5242

    
           xy /= tf.constant([[self.imgsz[1], self.imgsz[0]]], dtype=tf.float32) 
        
           wh /= tf.constant([[self.imgsz[1], self.imgsz[0]]], dtype=tf.float32)

And then we have to denormalize them here during inference. This is a problem now because I want to use DetectMultiBackend for ClassificationModel and SegmentationModel support in addition to DetectionModels, so the denormalization op will hurt ClassificationModels.

yolov5/models/common.py

Line 546 in 92b5242

y[..., :4] *= [w, h, w, h] # xywh normalized to pixels

Can we remove the normalize-denormalize op? Is it only there for quantization improvement? Are you sure it's helping the quantization?

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>

zldrobit · 2022-09-20T09:41:02Z

@glenn-jocher

Can we remove the normalize-denormalize op?

If YOLOv5 has to support detection model export for int8 TFLite models, the normalize/denormalize code has to be kept.

Is it only there for quantization improvement?

Yes. Removing the normalize/denormalize code does not affect export/inference for TFLite models on fp32/fp16 precision.

Are you sure it's helping the quantization?

Yes, I could confirm that. For TF 2.4/2.5/2.6, removing the normalize/denormalize code, the accuracy drops drastically after int8 quantization. I also tested it with TF 2.9.2/2.10.0, int8 TFLite models without the normalization code have zero mAP.

The reason of normalization is that TensorFlow keeps only one set of bias/multiplication factors in (de)quantization for tensor input/output. Thus, all input/output values of a tensor have to be normalized to the same range (e.g. 0-1). YOLOv5 currently concatenates bbox coordinates, bbox confidence and class probability into one tensor by Detect module. Before the concatenation, the bbox coordinates are normalized to 0-1 to reduce the quantization error in int8 TFLite quantization.

glenn-jocher · 2022-09-21T12:25:30Z

@zldrobit I see. It's difficult then, not sure what to do.

I was looking at this. Are we using per-axis or per-tensor quantization? If we moved to per-axis would this help?
https://www.tensorflow.org/lite/performance/quantization_spec

I've seen even with the current TFLite INT8 method we lose significant mAP vs FP16, which doesn't happen with CoreML. I don't have the validation results handy but I'll re-run them now and post. I think the drop may be about 20%.

glenn-jocher · 2022-09-21T13:17:29Z

@zldrobit ok here's my test:

PyTorch

!python val.py --weights yolov5s.pt --batch 1
val: data=data/coco128.yaml, weights=['yolov5s.pt'], batch_size=1, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v6.2-149-g77dcf55 Python-3.7.14 torch-1.12.1+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
val: Scanning '/content/datasets/coco128/labels/train2017' images and labels...126 found, 2 missing, 0 empty, 0 corrupt: 100% 128/128 [00:00<00:00, 3471.77it/s]
val: New cache created: /content/datasets/coco128/labels/train2017.cache
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 128/128 [00:04<00:00, 29.22it/s]
                   all        128        929      0.699      0.633      0.704      0.473
Speed: 0.2ms pre-process, 10.7ms inference, 1.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/val/exp

CoreML

(venv) glennjocher@Glenns-MacBook-Air yolov5 % python val.py --weights yolov5s-fp16.mlmodel --batch 1
val: data=data/coco128.yaml, weights=['yolov5s-fp16.mlmodel'], batch_size=1, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v6.2-151-g5ff8c10 Python-3.10.6 torch-1.13.0.dev20220828 CPU

Loading yolov5s-fp16.mlmodel for CoreML inference...
TensorFlow version 2.10.0 has not been tested with coremltools. You may run into unexpected errors. TensorFlow 2.8.0 is the most recent version that has been tested.
Torch version 1.13.0.dev20220828 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models
val: Scanning '/Users/glennjocher/PycharmProjects/datasets/coco128/labels/train2017.cache' images and labels... 126 found, 2 missing, 0 empty, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s]     
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 128/128 [00:02<00:00, 48.93it/s]                                                             
                   all        128        929      0.681      0.653      0.711      0.472
Speed: 0.4ms pre-process, 13.8ms inference, 2.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/val/exp6

(venv) glennjocher@Glenns-MacBook-Air yolov5 % python val.py --weights yolov5s-int8.mlmodel --batch 1
val: data=data/coco128.yaml, weights=['yolov5s-int8.mlmodel'], batch_size=1, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v6.2-151-g5ff8c10 Python-3.10.6 torch-1.13.0.dev20220828 CPU

Loading yolov5s-int8.mlmodel for CoreML inference...
TensorFlow version 2.10.0 has not been tested with coremltools. You may run into unexpected errors. TensorFlow 2.8.0 is the most recent version that has been tested.
Torch version 1.13.0.dev20220828 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models
val: Scanning '/Users/glennjocher/PycharmProjects/datasets/coco128/labels/train2017.cache' images and labels... 126 found, 2 missing, 0 empty, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s]     
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 128/128 [00:02<00:00, 49.59it/s]                                                             
                   all        128        929      0.683      0.639      0.702      0.465
Speed: 0.3ms pre-process, 13.6ms inference, 2.4ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/val/exp5

TFLite (TODO)

!git clone https://github.com/ultralytics/yolov5  # clone
%cd yolov5
%pip install -qr requirements.txt  # install

!python export.py --include tflite
!python export.py --include tflite --int8

!python val.py weights yolov5s-fp16.tflite --batch 1
val: data=data/coco128.yaml, weights=['yolov5s-fp16.tflite'], batch_size=1, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v6.2-149-g77dcf55 Python-3.7.14 torch-1.12.1+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

Loading yolov5s-fp16.tflite for TensorFlow Lite inference...
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models
val: Scanning '/content/datasets/coco128/labels/train2017.cache' images and labels... 126 found, 2 missing, 0 empty, 0 corrupt: 100% 128/128 [00:00<?, ?it/s]
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 128/128 [00:40<00:00,  3.15it/s]
                   all        128        929       0.68      0.653       0.71      0.471
Speed: 0.4ms pre-process, 303.4ms inference, 1.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/val/exp2
val: data=data/coco128.yaml, weights=['yolov5s-int8.tflite'], batch_size=1, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v6.2-149-g77dcf55 Python-3.7.14 torch-1.12.1+cu113 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)

!python val.py weights yolov5s-int8.tflite --batch 1
Loading yolov5s-int8.tflite for TensorFlow Lite inference...
Forcing --batch-size 1 square inference (1,3,640,640) for non-PyTorch models
val: Scanning '/content/datasets/coco128/labels/train2017.cache' images and labels... 126 found, 2 missing, 0 empty, 0 corrupt: 100% 128/128 [00:00<?, ?it/s]
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100% 128/128 [48:12<00:00, 22.60s/it]
                   all        128        929      0.709       0.58       0.68      0.425
Speed: 0.4ms pre-process, 22581.7ms inference, 1.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/val/exp3

zldrobit · 2022-09-23T09:27:56Z

@glenn-jocher

I was looking at this. Are we using per-axis or per-tensor quantization? If we moved to per-axis would this help?

TensorFlow may use per-axis quantization for weights, but it uses per-tensor quantization for computation input/output according to https://www.tensorflow.org/lite/performance/quantization_spec. I excerpt the paragraph as follows,

Per-axis (aka per-channel in Conv ops) or per-tensor weights are represented by int8 two’s complement values in the range [-127, 127] with zero-point equal to 0. Per-tensor activations/inputs are represented by int8 two’s complement values in the range [-128, 127], with a zero-point in range [-128, 127].

It's a limitation of TensorFlow, and we could not move the computation input/output to a per-axis manner.

I've seen even with the current TFLite INT8 method we lose significant mAP vs FP16, which doesn't happen with CoreML. I don't have the validation results handy but I'll re-run them now and post. I think the drop may be about 20%.

According to Performance Evaluation of INT8 Quantized Inference on Mobile GPUs,

, and https://coremltools.readme.io/docs/quantization

.
As I understand, CoreML only supports int8 quantization for weights, while it still calculates in fp32/fp16 precision. Though yolov5s-int8.mlmodel is int8 quantized, the inference is of fp32/fp16 precision. IMHO, yolov5s-int8.mlmodel is expected to have a higher mAP than yolov5s-int8.tflite, considering the inference of yolov5s-int8.tflite is in a lower, i.e. int8, precision.

glenn-jocher · 2022-09-24T16:12:41Z

@zldrobit understood, thanks for the analysis. I guess I'll leave the normalisation alone for now until I can find a better solution.

For PyTorch model inference we can determine the type of model, i.e. ClassificationModel, SegmentationModel, DetectionModel, and use this to determine if we should de-normalize boxes, but if someone loads an exported TF model I'm not exactly sure how to do that since they are all TFModel types.

Do you know if we can search the model for TFDetect or TFSegment classes that would confirm it's in need of denormalization? If they're missing then it's likely a classification model and we can skip the denormalization.

zldrobit · 2022-09-26T08:10:25Z

@glenn-jocher The class information (e.g. TFDetect or TFSegment) could be saved in TF SavedModel format, just like a Pytorch model. The TF GraphDef (.pb) format does not hold class information. One can save class information in TFLite models' metadata (https://www.tensorflow.org/lite/models/convert/metadata). I am wondering if using a filename with the -cls suffix is easier for identifying a YOLOv5 classification model.

glenn-jocher · 2023-11-15T09:59:41Z

@zldrobit That's a great suggestion. Adding a naming convention like using a filename with the -cls suffix to denote a YOLOv5 classification model would indeed make it easier to identify the type of model when loading it. This would simplify the logic for determining whether denormalization is necessary based on the model type.

I'll explore this approach further and see how we can incorporate it into the model loading process.

Thanks for the input!

TensorFlow SegmentationModel support

cd5d1eb

glenn-jocher self-assigned this Sep 18, 2022

glenn-jocher and others added 4 commits September 18, 2022 17:34

Merge branch 'master' into tf/update

52eddf8

TensorFlow SegmentationModel support

62781ba

TensorFlow SegmentationModel support

fe146a1

[pre-commit.ci] auto fixes from pre-commit.com hooks

ddf870a

for more information, see https://pre-commit.ci

glenn-jocher added 4 commits September 18, 2022 18:25

TFLite fixes

c7b8a34

GraphDef fixes

ff79bf1

Update ci-testing.yml

9ff0a42

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>

Merge branch 'master' into tf/update

3375a89

glenn-jocher merged commit fda8aa5 into master Sep 18, 2022

glenn-jocher deleted the tf/update branch September 18, 2022 17:52

Hojland mentioned this pull request Oct 17, 2022

feat/bump Go-Autonomous/yolov5#15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorFlow SegmentationModel support #9472

TensorFlow SegmentationModel support #9472

glenn-jocher commented Sep 18, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Sep 18, 2022

zldrobit commented Sep 20, 2022

glenn-jocher commented Sep 21, 2022

glenn-jocher commented Sep 21, 2022 •

edited

Loading

zldrobit commented Sep 23, 2022

glenn-jocher commented Sep 24, 2022

zldrobit commented Sep 26, 2022

glenn-jocher commented Nov 15, 2023

TensorFlow SegmentationModel support #9472

TensorFlow SegmentationModel support #9472

Conversation

glenn-jocher commented Sep 18, 2022 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

glenn-jocher commented Sep 18, 2022

zldrobit commented Sep 20, 2022

glenn-jocher commented Sep 21, 2022

glenn-jocher commented Sep 21, 2022 • edited Loading

PyTorch

CoreML

TFLite (TODO)

zldrobit commented Sep 23, 2022

glenn-jocher commented Sep 24, 2022

zldrobit commented Sep 26, 2022

glenn-jocher commented Nov 15, 2023

glenn-jocher commented Sep 18, 2022 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Sep 21, 2022 •

edited

Loading