Support BF16, Speed Up FP16, AVX512, Full Quantization
Pre-release
Pre-release
Features:
- Add BF16 Support, only optimized for ARMv7a / ARMv8, set MNN_SUPPORT_BF16 to open it.
- Refract FP16 (ARM82) , speed up 10%-20% for benchmark model, support deconvolution / matmul.
- Support full Int8 compute, remove int8-float convert for data reorder Ops (In Test)
- Support AVX512 for float compute
Optimize / Bugfix:
- Fix bug for HIAI-NPU Backend's multi output
- Add GatherV2 grad for Training
- Add fastTestTflite.py for test tflite -> MNN correctly
- Add Dup Op Fuse in Graph Optimization
- Add GridSample for CPU / Metal
- Fix compile bug for Vulkan shader by glslang 4.5