Skip to content

v0.19.0

Latest
Compare
Choose a tag to compare
@angeloskath angeloskath released this 18 Oct 19:35
· 4 commits to main since this release
58a8556

Highlights

  • Speed improvements
    • Up to 6x faster CPU indexing benchmarks
    • Faster Metal compiled kernels for strided inputs benchmarks
    • Faster generation with fused-attention kernel benchmarks
  • Gradient for grouped convolutions
  • Due to Python 3.8's end-of-life we no longer test with it on CI

Core

  • New features
    • Gradient for grouped convolutions
    • mx.roll
    • mx.random.permutation
    • mx.real and mx.imag
  • Performance
    • Up to 6x faster CPU indexing benchmarks
    • Faster CPU sort benchmarks
    • Faster Metal compiled kernels for strided inputs benchmarks
    • Faster generation with fused-attention kernel benchmarks
    • Bulk eval in safetensors to avoid unnecessary serialization of work
  • Misc
    • Bump to nanobind 2.2
    • Move testing to python 3.9 due to 3.8's end-of-life
    • Make the GPU device more thread safe
    • Fix the submodule stubs for better IDE support
    • CI generated docs that will never be stale

NN

  • Add support for grouped 1D convolutions to the nn API
  • Add some missing type annotations

Bugfixes

  • Fix and speedup row-reduce with few rows
  • Fix normalization primitive segfault with unexpected inputs
  • Fix complex power on the GPU
  • Fix freeing deep unevaluated graphs details
  • Fix race with array::is_available
  • Consistently handle softmax with all -inf inputs
  • Fix streams in affine quantize
  • Fix CPU compile preamble for some linux machines
  • Stream safety in CPU compilation
  • Fix CPU compile segfault at program shutdown