Release coremltools 8.0b1 · apple/coremltools

For all the new features, find the updated documentation in the docs-guides

New utilities coremltools.utils.MultiFunctionDescriptor() and coremltools.utils.save_multifunction , for creating an mlprogram with multiple functions in it, that can share weights. Updated the model loading API to load specific functions for prediction.
Stateful Core ML models: updates to the converter to produce Core ML models with the State Type (new type introduced in iOS18/macOS15).
coremltools.optimize
- Updates to model representation (mlprogram) pertaining to compression:
  - Support compression with more granularities: blockwise quantization, grouped channel wise palettization
  - 4 bit weight quantization (in addition to 8 bit quantization that was already supported)
  - 3 bit palettization (in addition to 1,2,4,6,8 bit palettization that was already supported)
  - Support joint compression modes:
    - 8 bit Look-up-tables for palettization
    - ability to combine weight pruning and palettization
    - ability to combine weight pruning and quantization
- API updates:
  - coremltools.optimize.coreml
    - Updated existing APIs to account for features mentioned above
    - Support joint compression by applying compression techniques on an already compressed model
    - A new API to support activation quantization using calibration data, which can be used to take a W16A16 Core ML model and produce a W8A8 model: ct.optimize.coreml.experimental.linear_quantize_activations
      - (to be upgraded from experimental to the official name space in a future release)
  - coremltools.optimize.torch
    - Updated existing APIs to account for features mentioned above
    - Added new APIs for data free compression (PostTrainingPalettizer , PostTrainingQuantizer
    - Added new APIs for calibration data based compression (SKMPalettizer for sensitive k-means palettization algorithm, layerwise_compression for GPTQ/sparseGPT quantization/pruning algorithm)
    - Updated the APIs + the coremltools.convert implementation, so that for converting torch models compressed with ct.optimize.torch , there is no longer a need to provide additional pass pipeline arguments.
iOS18 / macOS15 ops
- compression related ops: constexpr_blockwise_shift_scale, constexpr_lut_to_dense, constexpr_sparse_to_dense, etc
- updates to the GRU op
- PyTorch op scaled_dot_product_attention
Experimental torch.export conversion support

import torch
import torchvision

import coremltools as ct

torch_model = torchvision.models.vit_b_16(weights="IMAGENET1K_V1")

x = torch.rand((1, 3, 224, 224))
example_inputs = (x,)
exported_program = torch.export.export(torch_model, example_inputs)

coreml_model = ct.convert(exported_program)

Various other bug fixes, enhancements, clean ups and optimizations

Known Issues

Conversion will fail when using certain palettization modes (e.g. int8 LUT, vector palettization) with torch models using ct.optimize.torch
Some of the joint compression modes when used with the training time APIs in ct.optimize.torch will result in a torch model that is not correctly converted
The post-training palettization config for mlpackage models (ct.optimize.coreml.``OpPalettizerConfig) does not yet have all the arguments that are supported in the cto.torch.palettization APIs (e.g. lut_dtype (to get int8 dtyped LUT), cluster_dim (to do vector palettization), enable_per_channel_scale (to apply per-channel-scale) etc).
Applying symmetric quantization using GPTQ algorithm with ct.optimize.torch.layerwise_compression.LayerwiseCompressor will not produce the correct quantization scales, due to a known bug. This may lead to poor accuracy for the quantized model

Special thanks to our external contributors for this release: @teelrabbit @igeni @Cyanosite

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

coremltools 8.0b1

Known Issues

Contributors