Skip to content

DeepSparse v0.12.0

Compare
Choose a tag to compare
@jeanniefinks jeanniefinks released this 22 Apr 13:44
· 5 commits to release/0.12 since this release
01a427a

New Features:

Documentation:

Changes:

Performance:

  • Speedup for large batch sizes when using sync mode on AMD EPYC processors.
  • AVX2 improvements for
    • Up to 40% speedup out of the box for dense quantized models.
    • Up to 20% speedup for pruned quantized BERT, ResNet-50 and MobileNet.
  • Speedup from sparsity realized for ConvInteger operators.
  • Model compilation time decreased on systems with many cores.
  • Multi-stream Scheduler: certain computations that were executed during runtime are now precomputed.
  • Hugging Face Transformers integration updated to latest state from upstream main branch.

Documentation:

Resolved Issues:

  • When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will no longer disable optimizations and see very poor performance.
  • Users executing arch.bin now receive a correct architecture profile of their system.

Known Issues:

  • When running the DeepSparse engine on a system with a nonuniform system topology, for example, an AMD EPYC processor where some cores per core-complex (CCX) have been disabled, model compilation will never terminate. A workaround is to set the environment variable NM_SERIAL_UNIT_GENERATION=1.