Skip to content

Support BF16, Speed Up FP16, AVX512, Full Quantization

Pre-release
Pre-release
Compare
Choose a tag to compare
@jxt1234 jxt1234 released this 21 Apr 01:50
· 562 commits to master since this release
2e17ca8

Features:

  1. Add BF16 Support, only optimized for ARMv7a / ARMv8, set MNN_SUPPORT_BF16 to open it.
  2. Refract FP16 (ARM82) , speed up 10%-20% for benchmark model, support deconvolution / matmul.
  3. Support full Int8 compute, remove int8-float convert for data reorder Ops (In Test)
  4. Support AVX512 for float compute

Optimize / Bugfix:

  1. Fix bug for HIAI-NPU Backend's multi output
  2. Add GatherV2 grad for Training
  3. Add fastTestTflite.py for test tflite -> MNN correctly
  4. Add Dup Op Fuse in Graph Optimization
  5. Add GridSample for CPU / Metal
  6. Fix compile bug for Vulkan shader by glslang 4.5