Releases: sony/model_optimization
Release 2.1.0
What's Changed
General changes:
- Quantization enhancements:
- Improved quantization parameters: Backpropagate the threshold of concatenation layers. This helps to minimize data loss during the quantization of these layer types.
- Improved weights quantization parameters selection: Introduced Hessian-based MSE quantization error method.
- Set
weights_error_method
toQuantizationErrorMethod.HMSE
in QuantizationConfig in CoreConfig - Currently, this feature is only available in GPTQ due to the increased runtime required for Hessian computation.
- Set
- Improved mixed precision: Use normalized MSE as distance metric in mixed precision sensitivity evaluation for non Hessian-based methods.
- Improved mixed precision runtime: Added a validation step to determine whether quantizing the model to a requested target resource utilization requires mixed precision, or it can be achieved by quantizing the model to the maximal bit-width precision available.
- Automatically removed identity layers to improve graph optimizations..
- Introduced TPC IMX500.v2:
- Enabled a new feature: metadata. A metadata is a dictionary that is saved in the model file and object that contains information about the MCT environment (e.g. MCT version, framework version, etc.).
- Quantize unfolded BatchNorm layers.
- Default TPC remains IMX500.v1. For selecting IMX500.v2 use:
tpc_v2 = mct.get_target_platform_capabilities("tensorflow", 'imx500', target_platform_version="v2")
mct.ptq.keras_post_training_quantization(model, representative_data_gen, target_platform_capabilities=tpc_v2)
Tutorials
MCT tutorial notebooks updates:
- Reorganized the tutorials into separate sections: IMX500 and MCT features.
- Added new tutorials for IMX500: an object detection YOLOv8n quantization in Keras and PyTorch, including an optional Gradient-Based PTQ step for optimized performance.
- Removed the “quick-start” integration tool from MCT.
Breaking changes:
- TF 2.11 is no longer supported.
Bug fixes:
- Fixed a bug in the GPTQ parameters update.
- Fixed a bug in the similarity analyzer when bias correction is used.
- Fixed a bug in logging
tf.image.combined_non_max_suppression
to Tensorboard (#1055).
Release 2.0.0
What's Changed
Major updates:
- Structured pruning for PyTorch models: MCT now employs structured and hardware-aware pruning for PyTorch models in addition to Keras models.
- Additional details can be found here.
- Try our tutorial Structured Pruning of a Fully-Connected PyTorch Model
- API changes - The MCT "experimental" API has been formalized. Refer to the API documentation for details.
- Quantization parameters search is now faster! Enhanced vectorized search for per-channel threshold delivers quicker results.
General changes:
- Tensorflow 2.15 is now supported.
- Model Statistics Collector - Improved robustness to representative datasets:
- The mean estimator has been switched from IIR to standard mean.
- Activations merged histogram is now with a fixed bin width.
- PyTorch Model Export - the exporter API now includes an argument for specifying the ONNX OPSET version. View the updated API here. The default OPSET version was set to version 15.
- Target Platform Capabilities (TPC) - 'torch.squeeze' operation has been added to the TPC’s “no quantization” list.
- Mixed Precision - several modifications for improving mixed precision usability and stability:
- Hessian-based scores are now disabled by default (faster execution).
- A new API for enabling different weighting methods through the MixedPrecisionQuantizationConfiguration (MpDistanceWeighting enum).
- A rare numerical issue in distance metric computation has been fixed.
- MixedPrecisionQuantizationConfig is now initialized by default in CoreConfig.
- Internal representation of constants in a model has been modified, resolving issues #918 and #812 .
- MCT Troubleshooting and FAQ - check out the Quantization Troubleshooting for common pitfalls and some tools to improve quantization accuracy as well as the FAQ for common issues.
- MCT tutorials – additional tutorials have been added to provide insights of MCT features such as z-score threshold tutorial and quantization parameters search tutorial. In addition, a new tutorial demonstrating semantic segmentation model quantization has been added.
Breaking changes:
- API changes
- Removal of “experimental” from PTQ and GPTQ quantization facades
- keras_post_training_quantization_experimental -> keras_post_training_quantization.
- keras_gradient_post_training_quantization_experimental -> keras_gradient_post_training_quantization
- pytorch_post_training_quantization_experimental -> pytorch_post_training_quantization.
- pytorch_gradient_post_training_quantization_experimental -> pytorch_gradient_post_training_quantization
- Renaming QAT methods
- keras_quantization_aware_training_init -> keras_quantization_aware_training_init_experimental.
- keras_quantization_aware_training_finalize -> keras_quantization_aware_training_finalize_experimental.
- Renaming KPI to ResourceUtilization - this include changes to the KPI object class name, kpi_data facade and target_kpi argument to all the facades.
- Renaming Mixed Precision configuration and GPTQ configuration
- MixedPrecisionQuantizationConfigV2 -> MixedPrecisionQuantizationConfig.
- GradientPTQConfigV2 -> GradientPTQConfig.
- Renaming Keras data generation API
- tensorflow_data_generation_experimental -> keras_data_generation_experimental.
- get_tensorflow_data_generation_config -> get_keras_data_generation_config.
- Removal of “experimental” from PTQ and GPTQ quantization facades
- Weights Attributes Quantization – weights quantization is now extended to support additional attributes beside the “kernel”. This is considered an experimental feature and the default behavior is kept unchanged (i.e., only the kernel of linear layers will be quantized with weights quantizer). Enabling this feature for specific attributes requires creating a modified TPC.
weights_per_channel_threshold
was removed from QuantizationConfig (since it is not used anymore, and can be configured in the TPC).- QuantizationMethod.KMEANS has been removed.
- Unused parameter
hessians_n_iter
has been removed from GPTQHessianScoresConfig. FolderImageLoader
has been removed.
Bug fixes:
- Fixed Python 3.9 Windows file permission error in Fakely TFlite exporter - #865
- Fixed MCT fails for “torch.nn.functional.layer_norm - #921
- Fixed MCT fails for "dims" parameter of type List in “torch.permute - #935
New Contributors
Welcome @samuel-wj-chapman for his first contribution! PR #959
Full Changelog: v1.11.0...v2.0.0
Release v1.11.0
What's Changed
Major updates:
-
Structured pruning for Keras models: MCT now employs structured and hardware-aware pruning. This pruning technique is designed to compress models for specific hardware architectures, taking into account the target platform's "Single Instruction, Multiple Data" (SIMD) capabilities.
- Additional details can be found here.
- Run a tutorial on Google Colab!
-
Learned Step Size Quantization (LSQ) implementation for QAT. To understand how to use LSQ, please refer to our API documentation here.
General changes:
- New tutorials were added! Nanodet-Plus, EfficientDet, and more. These tutorials and more can be found here.
- Support for new NN framework versions was added (Tensorflow v2.14 and Pytorch v2.1).
- Hessian scores used as sensitivity importance scores in mixed-precision, GPTQ, and pruning now support Hessian scoring w.r.t model's weights (in addition to previously supported Hessian w.r.t model's activations).
- Added support for external regularization factor in GPTQ. Please refer to the API for Keras and Pytorch usage.
- Custom layers in Keras, previously unsupported, are now skipped during quantization.
Breaking changes:
-
Names of Hessian-related variables and methods have been revised:
- GPTQHessianWeightsConfig Changes:
- The class
GPTQHessianWeightsConfig
is renamedGPTQHessianScoresConfig
. - The parameter
norm_weights
is renamednorm_scores
. - New API can be found here.
- The class
- MixedPrecisionQuantizationConfigV2 Changes:
- The parameter
use_grad_based_weights
is renameduse_hessian_based_scores
. - The parameter
norm_weights
is renamednorm_scores
. - New API can be found here.
- The parameter
- GPTQHessianWeightsConfig Changes:
-
Exporter changes: New QuantizationFormat 'MCTQ' exports models with mct-quantizers modules. Also, TPC should not be passed during export; instead, a QuantizationFormat is passed directly. For more details and updated usage examples, please see here.
-
The output replacement mechanism has been eliminated from the Hessian computation. As a result, models with specific layer outputs, such as argmax, are now incompatible with the Hessian scoring metric in features like GPTQ and mixed-precision. So, Hessian scoring needs to be deactivated when using these features.
Bug fixes:
- Fixed a permission error during TensorFlow model export on Windows systems. #865 by @jgerityneurala.
- Fixed an issue with pickling torch models. [#841].
- Fixed an issue with systems operating with multiple CUDA devices. [#613].
- Fixed the unsupported NMS layer issue in mixed precision scenarios. [#844]
- Fixed an issue with PyTorch reshape substitute. [#799].
- Fixed an issue finalizing graph configuration following mixed-precision operations with mixed TPC. [#820].
- Tackled numeric problems in mixed precision caused by large values in the distance metric. Fixed by setting a threshold in the MP quantization configuration, ensuring that if a distance value exceeds this threshold, the metric is scaled down.
- Fixed an issue with reused TensorFlow SeparableConv2D decomposition concerning their reuse group.
- Fix bug in PyTorch BN folding into ConvTranspose2d with groups>1.
New Contributors
Welcome @jgerityneurala and @edenlum for their first contributions! PR #865, PR #873
Full Changelog: v1.10.0...v1.11.0
Release v1.10.0
What's Changed
Major Updates:
- Data Generation Library: The Data Generation Library has been added to the Model Compression Toolkit (MCT) project. This library allows users to generate synthetic data for compressing their models and enables quantization without requiring user-provided data. Check out an example of quantizing a model using generated data for torchvision's ResNet18 in this notebook.
General Changes:
- TensorFlow and PyTorch Support: Added support for TensorFlow 2.12 and 2.13, as well as PyTorch 2.0.
- Dependency Cleanup: All dependencies on 'tensorflow-model-optimization' have been removed.
- Quick-Start Tutorial: The quick-start tutorial has been updated with additional GPTQ and Mixed Precision (MP) options and minor bug fixes.
- New TPC: Added IMX500 TPC with weights quantized using non-uniform quantization (LookUp-Table).
Breaking Changes:
- Quantizer Identifier: Replaced the "quantizer_type" property with a new "identifier" property for all trainable quantizers. Each quantizer now has a dedicated identifier.
- ** Changes in Look-up Table (LUT) quantizers**: In Keras and PyTorch:
- Class variables names have been modified to align with MCT Quantizers names:
cluster_centers
->lut_values
multiplier_n_bits
->lut_values_bitwidth
lut_values
is now converted from a numpy array to a list before exporting the model.
- Class variables names have been modified to align with MCT Quantizers names:
Added Features:
- Forward-Fold Layers: Added support for forward-folding BatchNorm and DW-Conv with 1x1 kernel layers for improved quantization.
- Zero in LUT grid: LUT now explicitly includes zero in the quantization grid.
Improvements:
- Quick-Start Enhancements: Improved quick-start for running pre-trained models in MCT.
- Notebook Addition: Added a notebook for running pre-trained models in MCT and a notebook for quantizing a model using images generated with the data generation library.
- Mixed Precision Quantization: Mixed precision quantization is now applied using MCT Quantizers infrastructure.
- Configurable Quantizer Classes: Introduced new 'ConfigurableWeightsQuantizer' and `ConfigurableActivationQuantizer' quantizer classes to support mixed precision search, replacing the SelectiveQuantizer mechanism.
- BOPs Computation Fix: Fixed bit operations (BOPs) computation in mixed precision in the BOPs restriction scenario.
Fixed Issues:
- Param Search in SNC: Fixed param search during shift negative correction (SNC) in PyTorch [#771].
- Second Momentum Correction: Fixed second momentum correction when SNC is enabled [#771].
- Irrelevant Warning: Resolved an irrelevant warning related to the Kmeans function when running LUT quantization (no effect on the usability of the quantizer).
New Contributors
- @alexander-sony made their first contribution in #742
Contributors:
@alexander-sony @lior-dikstein @reuvenperetz @ofirgo @elad-c @eladc-git @haihabi @lapid92
Full Changelog: v1.9.0...v1.10.0
Release v1.9.1
Bug Fixes and Other Changes:
- An issue with mct 1.9.0 requirements file that caused the installation of mct-quantizers 1.2.0 was fixed. Now mct-quantizers version was set to 1.1.0 in mct requirements file to avoid this issue.
Release v1.9.0
What's Changed
Major updates
-
MCT Quantizers:
-
The Inferable Infrastructure package was extracted into an external repository - MCT Quantizers and a new dependency was added to the project's requirements (mct_quantizers library, see requirements file).
-
For changes in the quantized model structure, please refer to the latest release (v1.1.0) of the MCT Quantizers package. The latest changes include removing the activation quantizer from the “QuantizationWrapper” module, and replacing it with an “ActivationQuantizationHolder” that’s responsible for the activation quantization.
-
The extraction of the inferable infrastructure package included a breaking change to the quantization infrastructure API - the
quantization_infrastructure
package is no longer part of MCT’s API. It capabilities are split, and can be accessed as follows:inferable_infrastructure
components are available via the MCT Quantizers package. To access them usemct_quantizer.<Component>
(after installing the mct-quantizers package on your environment).trainable_infrastructure
components are available via the MCT’s API. To access them usemct.trainable_infrastructure.<Component>
.
-
-
MCT Tutorials:
- The new tutorials package exposes a simple framework for start using MCT for models quantization. This project demonstrates the capabilities of MCT and illustrates its interface with various model collections libraries. This project allows users to generate a quantized version of their chosen model with a single click by accessing a wide range of pre-trained models.
- Currently, the project supports a selection of models from each library. However, our ongoing goal is to continually expand the support, aiming to include more models in the future.
- In addition, all MCT tutorials and examples have been moved to the notebooks directory under this module.
- This release also includes several fixes to the existing MCT examples - new arguments and imports were fixed in QAT examples, Keras, and MobileNetv2 examples.
-
Exporter API changes:
- Instead of directly specifying the data type (fakely-quantized or INT8) as a mode, we are now passing the TPC which contains the desired exported quantization format.
- The models can be exported in two quantization formats - Fake Quant format, where weights and activations are float fakely-quantized values, and a new INT8 format, where weights and activations are represented using 8 bits integers. The quantization format value is set in the TPC.
- A serialization format is now passed to the exporter. This update implies changes in how models are exported, allowing TensorFlow models to be exported as TensorFlow models (.h5 extension) and TFLite models (.tflite extension), and PyTorch models to be exported as torch script models and ONNX models (.onnx extension).
- The mct.exporter.keras_export_model() function is now being used instead of mct.exporter.tflite_export_model().
-
API Rearrangement:
- We would like to inform you about breaking changes in the MCT's API that may affect your existing code. Functions and classes that were previously directly exposed are now organized under internal packages. The behavior of these functions remains unchanged; however, you will need to update how you access them.
- For example, what was previously accessed via
mct.DebugConfig
should now be accessed usingmct.core.DebugConfig
. - The full list of changes is as follows:
- mct.core:
- DebugConfig
- keras_kpi_data, keras_kpi_data_experimental
- pytorch_kpi_data, pytorch_kpi_data_experimental
- FolderImageLoader
- FrameworkInfo, ChannelAxis
- DefaultDict
- network_editor
- quantization_config
- mixed_precision_quantization_config
- QuantizationConfig, QuantizationErrorMethod, DEFAULTCONFIG
- CoreConfig
- KPI
- MixedPrecisionQuantizationConfig MixedPrecisionQuantizationConfigV2
- mct.qat:
- QATConfig, TrainingMethod
- keras_quantization_aware_training_init, keras_quantization_aware_training_finalize
- pytorch_quantization_aware_training_init, pytorch_quantization_aware_training_finalize
- mct.gptq:
- GradientPTQConfig, RoundingType, GradientPTQConfigV2
- keras_gradient_post_training_quantization_experimental
- get_keras_gptq_config
- pytorch_gradient_post_training_quantization_experimental
- get_pytorch_gptq_config
- mct.exporter:
- KerasExportSerializationFormat
- PytorchExportSerializationFormat
- keras_export_model
- pytorch_export_model
- mct.ptq:
- pytorch_post_training_quantization_experimental
- keras_post_training_quantization_experimental
- mct.core:
- Please update your code accordingly to ensure compatibility with the latest version of MCT.
- Also, notice that the old functions
keras_ptq
,keras_ptq_mp
,pytorch_ptq
, andpytorch_ptq_mp
are now deprecated and will be removed in the future. We highly recommend usingkeras_ptq_experimental
,pytorch_ptq_experimental
instead.
-
The
new_experimental_exporter
flag is now set to True by default inkeras_ptq_experimental
,keras_gptq_experimental
,pytorch_ptq_experimental
andpytorch_gptq_experimental
.
This change affects the returned quantized model MCT creates by wrapping the layers with the quantization information as detailed in MCT Quantizers library. There is no change during inference, and the quantized model usage is the same.
In addition, the new quantized model can be used for exporting the quantized model using the new experimental exporter.
General changes
-
- New symmetric soft rounding quantizers were added to Pytorch GPTQ, and a uniform soft rounding quantizer was added to both Pytorch and Keras GPTQ.
- GPTQ and QAT quantizer names have been modified with distinguishable suffixes (e.g.,
SymmetricSoftRounding
-->SymmetricSoftRoundingGPTQ
). - Trainable variables grouping - all trainable quantizers now hold a mapping of their trainable parameters, connecting each of them to a specific group, to allow a cleaner and simpler training (and training loop implementation).
- Regularization API for trainable quantizers - extracting the regularization from the quantizer to a higher level. Now, each trainable quantizer defines its regularization function (see for example Keras soft quantizer regularization).
- New “DNN Quantization with Attention (DQA)” quantizer for QAT (Pytorch).
-
GPTQ arguments:
- New
GPTQHessianWeightsConfig
class to provide the necessary arguments for computing the Hessian weights for GPTQ. - New
gptq_quantizer_params_override
argument in GPTQ config, to allow parameters override.
- New
-
Moved TPC from Core package into an independent [
target_platform_capabilities
](https://github.com/sony/model_optimization/tree/main/model_compression_toolkit...
Release v1.8.0
What's Changed
Major updates:
-
Quantization Aware Training is now supported in PyTorch (experimental):
- Training model: QAT training
- Finalize model (export): QAT finalize
- Explanation about our Pytorch QAT quantizers can be found here: Quantizers
-
New method for exporting quantized models (experimental):
- You can now get INT8 TFLite models using MCT. Exporting fakely-quantized models is still supported.
Please see Exporter for more information and usage examples.
- You can now get INT8 TFLite models using MCT. Exporting fakely-quantized models is still supported.
General changes:
- Add Quantization Infrastructure (QI)
- The new infrastructure will make it easy for developing new quantizers:
- Support multi-training algorithms.
- Support Keras and Pytorch frameworks.
- Quantizers are divided to two types:
- Trainable Quantizers: quantizers for training
- Inferable Quantizers: quantizers for inference (deployment)
- Currently only Quantization Aware Training (QAT) is supported in the new infrastructure.
for more information see: Quantization Infrastructure
- The new infrastructure will make it easy for developing new quantizers:
- Support TensorFlow v2.11.
- Support NUMPY v1.24: fix depreciated dtypes.
- Add IMX500 to TPC. Quantization methods are Symmetric Weights and Power-Of-2 activations. For getting the IMX500 TPC please use Get TargetPlatformCapabilities.
- Add Symmetric LUT quantization.
- Remove Gumbel-Rounding from GPTQ.
- Add Keras implementation of Soft-Rounding to GPTQ. Soft-Rounding is enabled by default. To change it please edit GPTQ Config.
Bug fixes:
- Remove unnecessary assert in Activation layer of type float64.
- Fix bugs and speed up gradients computation.
- Close issues:
Contributors
Full Changelog: v1.7.1...v1.8.0
Release v1.7.1
What's Changed
Bug fixes:
- Added outlier removal using Z threshold to Shift Negative Correction.
- Fixed mixed-precision issue for Pytorch models with multiple inputs.
- Fixed wrong KPI computation in mixed-precision when the model has reused layers.
- Fixed import error in statistics correction package. #470
Full Changelog: v1.7.0...v1.7.1
Release v1.7.0
What's Changed
Major updates:
- Changed API for new dataset iterator for MCT's representative dataset used for calibration iterations.
- For the representative dataset, you can now use a generator or an iterator class (or any Callable that implements _iter_ and _next_ methods).
- This affects the following facade methods:
- Argument n_iter from CoreConfig was removed, as it should be included in the representative dataset Callable
- A Change in get_keras_gptq_config and get_pytorch_gptq_config: n_iter was replaced with n_epochs. Set n_epochs to the times the representative dataset is iterated in the GPTQ process. Notice that now, in each GPTQ epoch, the representative dataset is iterated, unlike the previous behavior where a single batch from the representative dataset was used per GPTQ iteration.
- Changes in keras_gradient_post_training_quantization_experimental and pytorch_gradient_post_training_quantization_experimental
can receive a different dataset generator from the generator used for PTQ calibration and mixed-precision bit-width configuration search. If it is not passed while using GPTQ, the calibration dataset will be used for fine-tuning.
- Notice that the old API was not changed. n_iter can be used to set the number of iterations for the GPTQ process, and FolderImageLoader is still supported, and can be used for images loading, using the old API. However, using the new API is recommended.
General changes:
-
GPTQ changes:
- Added new experimental GPTQ configuration class named GradientPTQV2.
- Added support for uniform quantizers during GPTQ for PyTorch models. For using uniform quantizers, documentation can be found here.. Experimental.
- Added usage of tf.function in GPTQ for fine-tuning TensorFlow models to improve GPTQ runtime.
-
Mixed-Precision changes:
- Added new refinement procedure for improving mixed-precision bit-width configuration and enabled by default. It can be disabled using refine_mp_solution in MixedPrecisionQuantizationConfigV2.
- Improved mixed-precision configuration search runtime by refactoring the computation of distance matrix computation and similarity analysis functions.
-
Added second-moment correction (in addition to bias correction) for PyTorch and TensorFlow models.
Can be enabled by setting weights_second_moment_correction to True when creating a QuantizationConfig. (Experimental). -
Added a search for improved shift and threshold in the shift negative correction algorithm. Can be enabled by setting shift_negative_params_search to True when creating a QuantizationConfig). (Experimental)
-
Added support for TensorFlow v2.10, as well as PyTorch v1.13, v1.12
-
Tested using multiple Python versions: 3.10, 3.9, 3.8, 3.7
-
Removed LayerNormDecomposition substitution for TensorFlow models.
-
Added support for PyTorch convolution functional API.
-
Updated requirements file (excluding networkx v2.8.1). See requierments here.
-
Added new tutorials. Find all MCT's tutorials here.
-
Added a new look to our website! Check it out!
Bug fixes:
- Fixed small thresholds issue due to numerical issues. Changed calculation of PoT thresholds to be of type float64.
Contributors
New Contributors
- @tehiladaboush made their first contribution in #399 👏
Full Changelog: v1.6.0...v1.7.0
Release v1.6.0
What's Changed
Major updates:
-
Added Keras Quantization-Aware-Training (QAT) support (experimental): 🥳
- Added new functions to prepare a Keras model for QAT and finalize a model after it was retrained.
- You can find a tutorial for using QAT here.
- Run this tutorial in Google Colab!
- The API can be found here and here.
-
Added Gumbel-Rounding quantizer to Keras Gradient-Based PTQ (GPTQ) (experimental): 🎉
- A new quantizer can be used during GPTQ training and configured using GradientPTQConfig.
- Use get_keras_gptq_config documentation to easily create a GradientPTQConfig and start training using keras_gradient_post_training_quantization_experimental.
-
Added initial support for GPTQ for PyTorch models (experimental). Please visit the GradientPTQConfig documentation and get_pytorch_gptq_config documentation for more details.
General changes:
- Added support for LUT Kmean quantizer for activations for Keras and PyTorch models.
- GPTQ changes:
- Added support for weighted loss in Keras GPTQ.
- Default values in GradientPTQConfig were re-set.
- API of get_keras_gptq_config was changed.
- Please visit the GradientPTQConfig documentation and get_keras_gptq_config documentation for more details.
- MixedPrecisionQuantizationConfigV2 default values were changed. Please visit the MixedPrecisionQuantizationConfigV2 documentation for more details.
- Added support for buffers in PyTorch models (they do not require gradients and are thus not registered as parameters).
- Added layer-replacement action in the network editor. You can find more actions to edit the network here.
- Added support for constraining a model's number of Bit-Operations (BOPs). For more KPI options, please visit our documentation
- New tutorials were added for GPTQ and QAT for Keras models, as well as tutorials for how to use LUT quantizers. You can find all tutorials here 👩🏫
Bug fixes:
- Replaced TensorFlowOpLayer with TFOpLambda in Shift Negative Correction for Keras models.
- Skipped GPTQ training when the number of iterations is set to 0.
- Fixed optimizer import from Keras facade to support TF2.9.
- Fixed name in the license.
Contributors
New Contributors
- @Idan-BenAmi made their first contribution in #323 👏
Full Changelog: v1.5.0...v1.6.0