What's Changed
General changes:
- Quantization enhancements:
- Improved quantization parameters: Backpropagate the threshold of concatenation layers. This helps to minimize data loss during the quantization of these layer types.
- Improved weights quantization parameters selection: Introduced Hessian-based MSE quantization error method.
- Set
weights_error_method
toQuantizationErrorMethod.HMSE
in QuantizationConfig in CoreConfig - Currently, this feature is only available in GPTQ due to the increased runtime required for Hessian computation.
- Set
- Improved mixed precision: Use normalized MSE as distance metric in mixed precision sensitivity evaluation for non Hessian-based methods.
- Improved mixed precision runtime: Added a validation step to determine whether quantizing the model to a requested target resource utilization requires mixed precision, or it can be achieved by quantizing the model to the maximal bit-width precision available.
- Automatically removed identity layers to improve graph optimizations..
- Introduced TPC IMX500.v2:
- Enabled a new feature: metadata. A metadata is a dictionary that is saved in the model file and object that contains information about the MCT environment (e.g. MCT version, framework version, etc.).
- Quantize unfolded BatchNorm layers.
- Default TPC remains IMX500.v1. For selecting IMX500.v2 use:
tpc_v2 = mct.get_target_platform_capabilities("tensorflow", 'imx500', target_platform_version="v2")
mct.ptq.keras_post_training_quantization(model, representative_data_gen, target_platform_capabilities=tpc_v2)
Tutorials
MCT tutorial notebooks updates:
- Reorganized the tutorials into separate sections: IMX500 and MCT features.
- Added new tutorials for IMX500: an object detection YOLOv8n quantization in Keras and PyTorch, including an optional Gradient-Based PTQ step for optimized performance.
- Removed the “quick-start” integration tool from MCT.
Breaking changes:
- TF 2.11 is no longer supported.
Bug fixes:
- Fixed a bug in the GPTQ parameters update.
- Fixed a bug in the similarity analyzer when bias correction is used.
- Fixed a bug in logging
tf.image.combined_non_max_suppression
to Tensorboard (#1055).