Skip to content

Releases: sony/model_optimization

Release 2.1.0

28 May 08:05
9d3593f
Compare
Choose a tag to compare

What's Changed

General changes:

  • Quantization enhancements:
    • Improved quantization parameters: Backpropagate the threshold of concatenation layers. This helps to minimize data loss during the quantization of these layer types.
    • Improved weights quantization parameters selection: Introduced Hessian-based MSE quantization error method.
      • Set weights_error_method to QuantizationErrorMethod.HMSE in QuantizationConfig in CoreConfig
      • Currently, this feature is only available in GPTQ due to the increased runtime required for Hessian computation.
    • Improved mixed precision: Use normalized MSE as distance metric in mixed precision sensitivity evaluation for non Hessian-based methods.
    • Improved mixed precision runtime: Added a validation step to determine whether quantizing the model to a requested target resource utilization requires mixed precision, or it can be achieved by quantizing the model to the maximal bit-width precision available.
    • Automatically removed identity layers to improve graph optimizations..
  • Introduced TPC IMX500.v2:
    • Enabled a new feature: metadata. A metadata is a dictionary that is saved in the model file and object that contains information about the MCT environment (e.g. MCT version, framework version, etc.).
    • Quantize unfolded BatchNorm layers.
    • Default TPC remains IMX500.v1. For selecting IMX500.v2 use:
      • tpc_v2 = mct.get_target_platform_capabilities("tensorflow", 'imx500', target_platform_version="v2")
      • mct.ptq.keras_post_training_quantization(model, representative_data_gen, target_platform_capabilities=tpc_v2)

Tutorials

MCT tutorial notebooks updates:

  • Reorganized the tutorials into separate sections: IMX500 and MCT features.
  • Added new tutorials for IMX500: an object detection YOLOv8n quantization in Keras and PyTorch, including an optional Gradient-Based PTQ step for optimized performance.
  • Removed the “quick-start” integration tool from MCT.

Breaking changes:

  • TF 2.11 is no longer supported.

Bug fixes:

  • Fixed a bug in the GPTQ parameters update.
  • Fixed a bug in the similarity analyzer when bias correction is used.
  • Fixed a bug in logging tf.image.combined_non_max_suppression to Tensorboard (#1055).

Release 2.0.0

02 Apr 13:58
c4ccdc9
Compare
Choose a tag to compare

What's Changed

Major updates:

  • Structured pruning for PyTorch models: MCT now employs structured and hardware-aware pruning for PyTorch models in addition to Keras models.
  • API changes - The MCT "experimental" API has been formalized. Refer to the API documentation for details.
  • Quantization parameters search is now faster! Enhanced vectorized search for per-channel threshold delivers quicker results.

General changes:

  • Tensorflow 2.15 is now supported.
  • Model Statistics Collector - Improved robustness to representative datasets:
    • The mean estimator has been switched from IIR to standard mean.
    • Activations merged histogram is now with a fixed bin width.
  • PyTorch Model Export - the exporter API now includes an argument for specifying the ONNX OPSET version. View the updated API here. The default OPSET version was set to version 15.
  • Target Platform Capabilities (TPC) - 'torch.squeeze' operation has been added to the TPC’s “no quantization” list.
  • Mixed Precision - several modifications for improving mixed precision usability and stability:
    • Hessian-based scores are now disabled by default (faster execution).
    • A new API for enabling different weighting methods through the MixedPrecisionQuantizationConfiguration (MpDistanceWeighting enum).
    • A rare numerical issue in distance metric computation has been fixed.
    • MixedPrecisionQuantizationConfig is now initialized by default in CoreConfig.
  • Internal representation of constants in a model has been modified, resolving issues #918 and #812 .
  • MCT Troubleshooting and FAQ - check out the Quantization Troubleshooting for common pitfalls and some tools to improve quantization accuracy as well as the FAQ for common issues.
  • MCT tutorials – additional tutorials have been added to provide insights of MCT features such as z-score threshold tutorial and quantization parameters search tutorial. In addition, a new tutorial demonstrating semantic segmentation model quantization has been added.

Breaking changes:

  • API changes
    • Removal of “experimental” from PTQ and GPTQ quantization facades
      • keras_post_training_quantization_experimental -> keras_post_training_quantization.
      • keras_gradient_post_training_quantization_experimental -> keras_gradient_post_training_quantization
      • pytorch_post_training_quantization_experimental -> pytorch_post_training_quantization.
      • pytorch_gradient_post_training_quantization_experimental -> pytorch_gradient_post_training_quantization
    • Renaming QAT methods
      • keras_quantization_aware_training_init -> keras_quantization_aware_training_init_experimental.
      • keras_quantization_aware_training_finalize -> keras_quantization_aware_training_finalize_experimental.
    • Renaming KPI to ResourceUtilization - this include changes to the KPI object class name, kpi_data facade and target_kpi argument to all the facades.
    • Renaming Mixed Precision configuration and GPTQ configuration
      • MixedPrecisionQuantizationConfigV2 -> MixedPrecisionQuantizationConfig.
      • GradientPTQConfigV2 -> GradientPTQConfig.
    • Renaming Keras data generation API
      • tensorflow_data_generation_experimental -> keras_data_generation_experimental.
      • get_tensorflow_data_generation_config -> get_keras_data_generation_config.
  • Weights Attributes Quantization – weights quantization is now extended to support additional attributes beside the “kernel”. This is considered an experimental feature and the default behavior is kept unchanged (i.e., only the kernel of linear layers will be quantized with weights quantizer). Enabling this feature for specific attributes requires creating a modified TPC.
  • weights_per_channel_threshold was removed from QuantizationConfig (since it is not used anymore, and can be configured in the TPC).
  • QuantizationMethod.KMEANS has been removed.
  • Unused parameter hessians_n_iter has been removed from GPTQHessianScoresConfig.
  • FolderImageLoader has been removed.

Bug fixes:

  • Fixed Python 3.9 Windows file permission error in Fakely TFlite exporter - #865
  • Fixed MCT fails for “torch.nn.functional.layer_norm - #921
  • Fixed MCT fails for "dims" parameter of type List in “torch.permute - #935

New Contributors

Welcome @samuel-wj-chapman for his first contribution! PR #959

Full Changelog: v1.11.0...v2.0.0

Release v1.11.0

03 Jan 13:32
bca5634
Compare
Choose a tag to compare

What's Changed

Major updates:

  • Structured pruning for Keras models: MCT now employs structured and hardware-aware pruning. This pruning technique is designed to compress models for specific hardware architectures, taking into account the target platform's "Single Instruction, Multiple Data" (SIMD) capabilities.

  • Learned Step Size Quantization (LSQ) implementation for QAT. To understand how to use LSQ, please refer to our API documentation here.

General changes:

  • New tutorials were added! Nanodet-Plus, EfficientDet, and more. These tutorials and more can be found here.
  • Support for new NN framework versions was added (Tensorflow v2.14 and Pytorch v2.1).
  • Hessian scores used as sensitivity importance scores in mixed-precision, GPTQ, and pruning now support Hessian scoring w.r.t model's weights (in addition to previously supported Hessian w.r.t model's activations).
  • Added support for external regularization factor in GPTQ. Please refer to the API for Keras and Pytorch usage.
  • Custom layers in Keras, previously unsupported, are now skipped during quantization.

Breaking changes:

  • Names of Hessian-related variables and methods have been revised:

    • GPTQHessianWeightsConfig Changes:
      • The class GPTQHessianWeightsConfig is renamed GPTQHessianScoresConfig.
      • The parameter norm_weights is renamed norm_scores.
      • New API can be found here.
    • MixedPrecisionQuantizationConfigV2 Changes:
      • The parameter use_grad_based_weights is renamed use_hessian_based_scores.
      • The parameter norm_weights is renamed norm_scores.
      • New API can be found here.
  • Exporter changes: New QuantizationFormat 'MCTQ' exports models with mct-quantizers modules. Also, TPC should not be passed during export; instead, a QuantizationFormat is passed directly. For more details and updated usage examples, please see here.

  • The output replacement mechanism has been eliminated from the Hessian computation. As a result, models with specific layer outputs, such as argmax, are now incompatible with the Hessian scoring metric in features like GPTQ and mixed-precision. So, Hessian scoring needs to be deactivated when using these features.

Bug fixes:

  • Fixed a permission error during TensorFlow model export on Windows systems. #865 by @jgerityneurala.
  • Fixed an issue with pickling torch models. [#841].
  • Fixed an issue with systems operating with multiple CUDA devices. [#613].
  • Fixed the unsupported NMS layer issue in mixed precision scenarios. [#844]
  • Fixed an issue with PyTorch reshape substitute. [#799].
  • Fixed an issue finalizing graph configuration following mixed-precision operations with mixed TPC. [#820].
  • Tackled numeric problems in mixed precision caused by large values in the distance metric. Fixed by setting a threshold in the MP quantization configuration, ensuring that if a distance value exceeds this threshold, the metric is scaled down.
  • Fixed an issue with reused TensorFlow SeparableConv2D decomposition concerning their reuse group.
  • Fix bug in PyTorch BN folding into ConvTranspose2d with groups>1.

New Contributors

Welcome @jgerityneurala and @edenlum for their first contributions! PR #865, PR #873

Full Changelog: v1.10.0...v1.11.0

Release v1.10.0

27 Sep 13:03
6695985
Compare
Choose a tag to compare

What's Changed

Major Updates:

  • Data Generation Library: The Data Generation Library has been added to the Model Compression Toolkit (MCT) project. This library allows users to generate synthetic data for compressing their models and enables quantization without requiring user-provided data. Check out an example of quantizing a model using generated data for torchvision's ResNet18 in this notebook.

General Changes:

  • TensorFlow and PyTorch Support: Added support for TensorFlow 2.12 and 2.13, as well as PyTorch 2.0.
  • Dependency Cleanup: All dependencies on 'tensorflow-model-optimization' have been removed.
  • Quick-Start Tutorial: The quick-start tutorial has been updated with additional GPTQ and Mixed Precision (MP) options and minor bug fixes.
  • New TPC: Added IMX500 TPC with weights quantized using non-uniform quantization (LookUp-Table).

Breaking Changes:

  • Quantizer Identifier: Replaced the "quantizer_type" property with a new "identifier" property for all trainable quantizers. Each quantizer now has a dedicated identifier.
  • ** Changes in Look-up Table (LUT) quantizers**: In Keras and PyTorch:
    • Class variables names have been modified to align with MCT Quantizers names:
      • cluster_centers -> lut_values
      • multiplier_n_bits -> lut_values_bitwidth
    • lut_values is now converted from a numpy array to a list before exporting the model.

Added Features:

  • Forward-Fold Layers: Added support for forward-folding BatchNorm and DW-Conv with 1x1 kernel layers for improved quantization.
  • Zero in LUT grid: LUT now explicitly includes zero in the quantization grid.

Improvements:

  • Quick-Start Enhancements: Improved quick-start for running pre-trained models in MCT.
  • Notebook Addition: Added a notebook for running pre-trained models in MCT and a notebook for quantizing a model using images generated with the data generation library.
  • Mixed Precision Quantization: Mixed precision quantization is now applied using MCT Quantizers infrastructure.
  • BOPs Computation Fix: Fixed bit operations (BOPs) computation in mixed precision in the BOPs restriction scenario.

Fixed Issues:

  • Param Search in SNC: Fixed param search during shift negative correction (SNC) in PyTorch [#771].
  • Second Momentum Correction: Fixed second momentum correction when SNC is enabled [#771].
  • Irrelevant Warning: Resolved an irrelevant warning related to the Kmeans function when running LUT quantization (no effect on the usability of the quantizer).

New Contributors

Contributors:

@alexander-sony @lior-dikstein @reuvenperetz @ofirgo @elad-c @eladc-git @haihabi @lapid92
Full Changelog: v1.9.0...v1.10.0

Release v1.9.1

01 Aug 12:44
Compare
Choose a tag to compare

Bug Fixes and Other Changes:

  • An issue with mct 1.9.0 requirements file that caused the installation of mct-quantizers 1.2.0 was fixed. Now mct-quantizers version was set to 1.1.0 in mct requirements file to avoid this issue.

Release v1.9.0

19 Jun 07:16
5f2e2d0
Compare
Choose a tag to compare

What's Changed

Major updates

  • MCT Quantizers:

    • The Inferable Infrastructure package was extracted into an external repository - MCT Quantizers and a new dependency was added to the project's requirements (mct_quantizers library, see requirements file).

    • For changes in the quantized model structure, please refer to the latest release (v1.1.0) of the MCT Quantizers package. The latest changes include removing the activation quantizer from the “QuantizationWrapper” module, and replacing it with an “ActivationQuantizationHolder” that’s responsible for the activation quantization.

    • The extraction of the inferable infrastructure package included a breaking change to the quantization infrastructure API - the quantization_infrastructure package is no longer part of MCT’s API. It capabilities are split, and can be accessed as follows:

      • inferable_infrastructure components are available via the MCT Quantizers package. To access them use mct_quantizer.<Component> (after installing the mct-quantizers package on your environment).
      • trainable_infrastructure components are available via the MCT’s API. To access them use mct.trainable_infrastructure.<Component>.
  • MCT Tutorials:

    • The new tutorials package exposes a simple framework for start using MCT for models quantization. This project demonstrates the capabilities of MCT and illustrates its interface with various model collections libraries. This project allows users to generate a quantized version of their chosen model with a single click by accessing a wide range of pre-trained models.
    • Currently, the project supports a selection of models from each library. However, our ongoing goal is to continually expand the support, aiming to include more models in the future.
    • In addition, all MCT tutorials and examples have been moved to the notebooks directory under this module.
      • This release also includes several fixes to the existing MCT examples - new arguments and imports were fixed in QAT examples, Keras, and MobileNetv2 examples.
  • Exporter API changes:

    • Instead of directly specifying the data type (fakely-quantized or INT8) as a mode, we are now passing the TPC which contains the desired exported quantization format.
    • The models can be exported in two quantization formats - Fake Quant format, where weights and activations are float fakely-quantized values, and a new INT8 format, where weights and activations are represented using 8 bits integers. The quantization format value is set in the TPC.
    • A serialization format is now passed to the exporter. This update implies changes in how models are exported, allowing TensorFlow models to be exported as TensorFlow models (.h5 extension) and TFLite models (.tflite extension), and PyTorch models to be exported as torch script models and ONNX models (.onnx extension).
    • The mct.exporter.keras_export_model() function is now being used instead of mct.exporter.tflite_export_model().
  • API Rearrangement:

    • We would like to inform you about breaking changes in the MCT's API that may affect your existing code. Functions and classes that were previously directly exposed are now organized under internal packages. The behavior of these functions remains unchanged; however, you will need to update how you access them.
    • For example, what was previously accessed via mct.DebugConfig should now be accessed using mct.core.DebugConfig.
    • The full list of changes is as follows:
      • mct.core:
        • DebugConfig
        • keras_kpi_data, keras_kpi_data_experimental
        • pytorch_kpi_data, pytorch_kpi_data_experimental
        • FolderImageLoader
        • FrameworkInfo, ChannelAxis
        • DefaultDict
        • network_editor
        • quantization_config
        • mixed_precision_quantization_config
        • QuantizationConfig, QuantizationErrorMethod, DEFAULTCONFIG
        • CoreConfig
        • KPI
        • MixedPrecisionQuantizationConfig MixedPrecisionQuantizationConfigV2
      • mct.qat:
        • QATConfig, TrainingMethod
        • keras_quantization_aware_training_init, keras_quantization_aware_training_finalize
        • pytorch_quantization_aware_training_init, pytorch_quantization_aware_training_finalize
      • mct.gptq:
        • GradientPTQConfig, RoundingType, GradientPTQConfigV2
        • keras_gradient_post_training_quantization_experimental
        • get_keras_gptq_config
        • pytorch_gradient_post_training_quantization_experimental
        • get_pytorch_gptq_config
      • mct.exporter:
        • KerasExportSerializationFormat
        • PytorchExportSerializationFormat
        • keras_export_model
        • pytorch_export_model
      • mct.ptq:
        • pytorch_post_training_quantization_experimental
        • keras_post_training_quantization_experimental
    • Please update your code accordingly to ensure compatibility with the latest version of MCT.
    • Also, notice that the old functions keras_ptq, keras_ptq_mp, pytorch_ptq, and pytorch_ptq_mp are now deprecated and will be removed in the future. We highly recommend using keras_ptq_experimental, pytorch_ptq_experimental instead.
  • The new_experimental_exporter flag is now set to True by default in keras_ptq_experimental, keras_gptq_experimental, pytorch_ptq_experimental and pytorch_gptq_experimental.
    This change affects the returned quantized model MCT creates by wrapping the layers with the quantization information as detailed in MCT Quantizers library. There is no change during inference, and the quantized model usage is the same.
    In addition, the new quantized model can be used for exporting the quantized model using the new experimental exporter.

General changes

Read more

Release v1.8.0

08 Feb 14:29
8d49e2c
Compare
Choose a tag to compare

What's Changed

Major updates:

  • Quantization Aware Training is now supported in PyTorch (experimental):

  • New method for exporting quantized models (experimental):

    • You can now get INT8 TFLite models using MCT. Exporting fakely-quantized models is still supported.
      Please see Exporter for more information and usage examples.

General changes:

  • Add Quantization Infrastructure (QI)
    • The new infrastructure will make it easy for developing new quantizers:
      • Support multi-training algorithms.
      • Support Keras and Pytorch frameworks.
      • Quantizers are divided to two types:
        • Trainable Quantizers: quantizers for training
        • Inferable Quantizers: quantizers for inference (deployment)
    • Currently only Quantization Aware Training (QAT) is supported in the new infrastructure.
      for more information see: Quantization Infrastructure
  • Support TensorFlow v2.11.
  • Support NUMPY v1.24: fix depreciated dtypes.
  • Add IMX500 to TPC. Quantization methods are Symmetric Weights and Power-Of-2 activations. For getting the IMX500 TPC please use Get TargetPlatformCapabilities.
  • Add Symmetric LUT quantization.
  • Remove Gumbel-Rounding from GPTQ.
  • Add Keras implementation of Soft-Rounding to GPTQ. Soft-Rounding is enabled by default. To change it please edit GPTQ Config.

Bug fixes:

  • Remove unnecessary assert in Activation layer of type float64.
  • Fix bugs and speed up gradients computation.
  • Close issues:
    • Integration with TFLite #528
    • Will MCT support int8 quantization? #273

Contributors

Full Changelog: v1.7.1...v1.8.0

Release v1.7.1

14 Dec 08:25
04bd851
Compare
Choose a tag to compare

What's Changed

Bug fixes:

  • Added outlier removal using Z threshold to Shift Negative Correction.
  • Fixed mixed-precision issue for Pytorch models with multiple inputs.
  • Fixed wrong KPI computation in mixed-precision when the model has reused layers.
  • Fixed import error in statistics correction package. #470

Full Changelog: v1.7.0...v1.7.1

Release v1.7.0

01 Dec 12:19
caffdf7
Compare
Choose a tag to compare

What's Changed

Major updates:

General changes:

  • GPTQ changes:

    • Added new experimental GPTQ configuration class named GradientPTQV2.
    • Added support for uniform quantizers during GPTQ for PyTorch models. For using uniform quantizers, documentation can be found here.. Experimental.
    • Added usage of tf.function in GPTQ for fine-tuning TensorFlow models to improve GPTQ runtime.
  • Mixed-Precision changes:

    • Added new refinement procedure for improving mixed-precision bit-width configuration and enabled by default. It can be disabled using refine_mp_solution in MixedPrecisionQuantizationConfigV2.
    • Improved mixed-precision configuration search runtime by refactoring the computation of distance matrix computation and similarity analysis functions.
  • Added second-moment correction (in addition to bias correction) for PyTorch and TensorFlow models.
    Can be enabled by setting weights_second_moment_correction to True when creating a QuantizationConfig. (Experimental).

  • Added a search for improved shift and threshold in the shift negative correction algorithm. Can be enabled by setting shift_negative_params_search to True when creating a QuantizationConfig). (Experimental)

  • Added support for TensorFlow v2.10, as well as PyTorch v1.13, v1.12

  • Tested using multiple Python versions: 3.10, 3.9, 3.8, 3.7

  • Removed LayerNormDecomposition substitution for TensorFlow models.

  • Added support for PyTorch convolution functional API.

  • Updated requirements file (excluding networkx v2.8.1). See requierments here.

  • Added new tutorials. Find all MCT's tutorials here.

  • Added a new look to our website! Check it out!

Bug fixes:

  • Fixed small thresholds issue due to numerical issues. Changed calculation of PoT thresholds to be of type float64.

Contributors

New Contributors

Full Changelog: v1.6.0...v1.7.0

Release v1.6.0

22 Sep 16:01
121efce
Compare
Choose a tag to compare

What's Changed

Major updates:

  • Added Keras Quantization-Aware-Training (QAT) support (experimental): 🥳

  • Added Gumbel-Rounding quantizer to Keras Gradient-Based PTQ (GPTQ) (experimental): 🎉

  • Added initial support for GPTQ for PyTorch models (experimental). Please visit the GradientPTQConfig documentation and get_pytorch_gptq_config documentation for more details.

General changes:

  • Added support for LUT Kmean quantizer for activations for Keras and PyTorch models.
  • GPTQ changes:
    • Added support for weighted loss in Keras GPTQ.
    • Default values in GradientPTQConfig were re-set.
    • API of get_keras_gptq_config was changed.
    • Please visit the GradientPTQConfig documentation and get_keras_gptq_config documentation for more details.
  • MixedPrecisionQuantizationConfigV2 default values were changed. Please visit the MixedPrecisionQuantizationConfigV2 documentation for more details.
  • Added support for buffers in PyTorch models (they do not require gradients and are thus not registered as parameters).
  • Added layer-replacement action in the network editor. You can find more actions to edit the network here.
  • Added support for constraining a model's number of Bit-Operations (BOPs). For more KPI options, please visit our documentation
  • New tutorials were added for GPTQ and QAT for Keras models, as well as tutorials for how to use LUT quantizers. You can find all tutorials here 👩‍🏫

Bug fixes:

  • Replaced TensorFlowOpLayer with TFOpLambda in Shift Negative Correction for Keras models.
  • Skipped GPTQ training when the number of iterations is set to 0.
  • Fixed optimizer import from Keras facade to support TF2.9.
  • Fixed name in the license.

Contributors

New Contributors

Full Changelog: v1.5.0...v1.6.0