27 Apr 17:44

tprimak

0e7ca73

v0.14

Performance optimizations

Improved fp32 Winograd convolution performance on Intel Xeon processors with Intel(R) AVX512 instruction set support.
Improved depthwise separable convolutions performance on processors with Intel(R) SSE 4.2, Intel(R) AVX and Intel(R) AVX512 instruction sets support.
Improved performance of GEMM-based convolutions backward propagation.
Improved performance of auxiliary primitives for NHWC and NCHW data layouts.

New functionality

Feature preview: Introduced recurrent neural network (RNN) support. This release includes training and inference support for uni- and bi-directional vanilla RNN and Long Short-Term Memory (LSTM) cells. Use of the new API is demonstrated with an example featuring LSTM model inference with attention based on Google Neural Machine Translation (GNMT) topology.
Added Winograd convolution implementation for int8 data type optimized for Intel Xeon processors with Intel AVX512 instruction set support. The implementation includes initial optimizations for future Intel Xeon processors with AVX512_VNNI instruction groups support.
Introduced deconvolution (or transposed convolution) primitive
Introduced support for 3D spatial data in convolution and auxiliary primitives. The following primitives are optimized for 3D tensors:
- reorders
- convolution
- deconvolution
- batch normalization
- pooling
- eltwise
- concat
- inner product

Usability improvements

Added flags -DWITH_TEST=OFF -DWITH_EXAMPLE=OFF in build system that disable building tests and examples.
Added –DLIB_SUFFIX flag that allows to add suffix to the lib directory.
Added prepare_mkl.bat script that automates download of Intel MKL small libraries on Windows.

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Zhong Cao @4pao, Dmitriy Gorokhov, Jian Tang @tensor-tang, Daniel M. Weeks @doctaweeks, Tony Wang @tonywang1990, Tao Lv @TaoLv and Xinyu Chen @xinyu-intel. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

Assets 5

05 Mar 22:39

tprimak

v0.13

dfe8f6d

v0.13

Performance optimizations

Added optimizations for future Intel(R) Xeon(R) processors with AVX512_VNNI instruction groups support. New instructions are used in direct convolutions with int8 and int16 data types.
Improved performance of int8 direct forward convolution on Intel Xeon processors with Intel AVX512 instruction set.
Improved performance of grouped convolutions and depthwise separable convolutions.

New functionality

Extended Batch Normalization to enable fused ReLU on forward and backward propagation.

Usability improvements

Improved profiling and debugging capabilities:
- New verbose mode reports detailed information about each Intel MKL-DNN primitive call including primitive name, data layout, implementation and execution time.
- Instrumentation and tracing technology (ITT) enables profiling of JIT code with Intel(R) Vtune(TM) Amplifier XE.
- JIT kernels can now be saved for inspection.
Extended documentation with details on int8 quantization, inference workflow, and fusion.
Added int8 inference example.

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Patric Zhao @pengzhao-intel, Ashok Emani @ashokei, Erik Kruus @kruus and Dmitriy Gorokhov. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

Assets 5

29 Dec 23:04

vpirogov

v0.12

ede1e33

v0.12

Performance optimizations

Improved performance of fp32 direct and Winograd convolution on Intel(R) Xeon(R) processors with Intel(R) Advanced Vector Instructions 512 (Intel(R) AVX512) support
Improved performance of int8 direct convolution on Intel Xeon processors with Intel AVX512 instruction set
Improved batch normalization performance on Intel Xeon processors with Intel AVX512 instruction set
Optimized dilated convolution backward propagation
Improved initialization time of GEMM-based convolution implementations

New functionality

Support for int8 inference. These functions support int8 data type:
- reorders (including quantization and dequantization)
- convolution
- pooling
- eltwise
- sum
- concat
Layer fusion support with the new post-ops API. Functions that support fusion:
- forward convolution with eltwise for inference and training
- convolution with sum for inference
- batch normalization with eltwise for training

API deprecations and breaking changes

ReLU primitive is deprecated. The functionality is a part of eltwise primitive
Merged convolution/ReLU primitive is deprecated. The functionality is available using the new post-ops API

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as @kruus, Yong Wu, Daoxin Pan, and Zhiming Wang. We would also like to thank everyone who asked questions and reported issues.

* Other names and brands may be claimed as the property of others.

Assets 5

30 Oct 15:49

vpirogov

v0.11

ba482ec

v0.11

Performance optimizations

Improved convolution performance on future Intel(R) Xeon Phi(TM) processors with AVX512_4FMAPS and AVX512_4VNNIW instruction groups support
Improved convolution performance on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support
Improved performance of GEMM-based convolutions for small minibatches
Improved performance of Winograd convolution algorithm on Intel Xeon Phi processors.

New functionality

Added backpropagation support for dilated convolution.
Eltwise primitive is extended with support for square, abs, square root, linear, bounded ReLU, soft ReLU and logistic.

Usability improvements

Added macOS* support.

Breaking changes to the API

All real-value op descriptors' parameters now have float data type (previously double). The change breaks C-API backward compatibility for sum primitive. Please refer to 0bbb22e for details. C++ API maintains backward compatibility.

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Yu Yang @reyoung, Vladimir Mironov @vamironov, Nishant Patel @nbpatel, Leona Cook @indie, Jayaram Bobba @jbobba, Elena Gvozdeva. We would also like to thank everyone who asked questions and reported issues.

* Other names and brands may be claimed as the property of others.

Assets 8

11 Aug 20:18

vpirogov

v0.10

fbd4d7b

v0.10

Performance optimizations

Improved performance on processors with Intel(R) AVX512 instruction set support
Added optimizations for future Intel(R) Xeon Phi(TM) processors with AVX512_4FMAPS and AVX512_4VNNIW instruction groups support

New functionality

Added support of Winograd convolution algorithm. The implementation has initial optimizations for Intel Xeon Phi processors with Intel AVX512 instruction set support.
Introduced elementwise primitive with 3 types of activations: ReLU (rectified linear unit), ELU (parametric exponential linear unit) and TANH (hyperbolic tangent non-linearity).
Added dilation support to forward convolution. The implementation is optimized for processors with Intel(R) SSE 4.2 and Intel(R) AVX instruction sets support.
Feature preview: Added int16 support in convolution, ReLU, pooling and inner product for training. Added optimized s16s16s32 convolution flavor for future Intel Xeon Phi processors.
Feature preview: Added optimized pooling with int8 support.

Usability improvements

Added Windows* support.
Added benchdnn test suite for comprehensive functional and performance testing of convolutions. The suite supports int8, int16 and fp32 data types.
Primitive implementation information can be queried using impl_info_str.

Deprecated functionality

ReLU primitive is deprecated and will be removed in future releases. Activation functions including ReLU are implemented in elementwise primitive.

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Guenther Schmuelling @guschmue, Yong Wu, Dmitriy Gorokhov, Menon Jaikrishnan, Erik @kruus, Zhong Z Cao @4pao, Gleb Gladilov and @tensor-tang. We would also like to thank everyone who asked questions and reported issues.

* Other names and brands may be claimed as the property of others.

Assets 8

19 May 22:59

vpirogov

v0.9

7b103b5

v0.9

Performance optimizations

Improved performance on processors with Intel(R) AVX2 instruction set support
Improved performance on processors with Intel(R) AVX512 instruction set support
Added optimizations for Intel(R) Xeon processors with Intel AVX512 instruction set support
Added inference optimizations for Intel(R) Atom processors with Intel(R) SSE4.2 support
Added JIT implementation of SGEMM for Intel(R) Xeon Phi(TM) processors.

New functionality

Average pooling supports 'exclude padding' mode
LRN supports arbitrary local size
Feature preview: Added int8 support in convolution, ReLU, pooling and inner product. Added optimized u8s8u8 convolution flavor for Intel Xeon processors with Intel AVX512 instruction set support.
Feature preview: Added int16 support in convolution, ReLU, pooling and inner product. Added optimized s16s16s32 convolution flavor for future Intel Xeon Phi processors.

Usability improvements

Improved build system to enable integration to other projects.
Intel(R) OpenMP runtime is used when the library built with binary dependency
Feature based dispatcher added to support wide range of Intel(R) processors and compatible

Thanks to the contributors

This release contains contributions from many Intel(R) Performance Libraries developers as well as Ismo Puustinen @ipuustin, Dmitry Gorokhov, Vladimir Dudnik @vladimir-dudnik, @pruthviIntel, and Chris Olivier @cjolivier01. We would also like to thank everyone who asked questions and reported issues.

Assets 8

25 Apr 15:01

vpirogov

v0.7

f0860e3

v0.7 Pre-release

Pre-release

Changes:

Improved performance on processors with Intel(R) AVX2 instruction set support
Improved performance on processors with Intel(R) AVX512 instruction set support
Extended backward propagation optimizations for Intel(R) AVX2 and Intel AVX512 instruction sets
Added SGEMM-based reference convolution implementation significantly improving performance for cases not covered by JIT convolution
Added JIT version of SGEMM function for Intel(R) AVX2 instruction set. This change allows to build optimized Intel(R) MKL-DNN without binary component.
Added backward propagation examples

Assets 11

07 Feb 21:49

vpirogov

v0.5

a96524b

v0.5 Pre-release

Pre-release

Changes:

Added runtime CPUID dispatching mechanism
Added initial Intel(R) AVX512 optimizations
Improved performance on processors with Intel(R) AVX2 instruction set support
Added initial backward propagation optimizations
Extended batch normalization primitive API with scale/shift and mean/variance parameters
Updated XByak to version 5.40

Assets 4

18 Nov 05:30

vpirogov

v0.3

19985aa

v0.3 Pre-release

Pre-release

Changes:

Added sum primitive
Added backward propagation reference implementation

Assets 5

10 Oct 09:49

vpirogov

v0.2

0fc7bd3

v0.2 Pre-release

Pre-release

Changes:

Added batch normalization
Added split and concat
Added linear response normalization inside the channel
Added average pooling

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance optimizations

New functionality

Usability improvements

Thanks to the contributors

Performance optimizations

New functionality

Usability improvements

Thanks to the contributors

Performance optimizations

New functionality

API deprecations and breaking changes

Thanks to the contributors

Performance optimizations

New functionality

Usability improvements

Breaking changes to the API

Thanks to the contributors

Performance optimizations

New functionality

Usability improvements

Deprecated functionality

Thanks to the contributors

Performance optimizations

New functionality

Usability improvements

Thanks to the contributors

Releases: oneapi-src/oneDNN

v0.14

Performance optimizations

New functionality

Usability improvements

Thanks to the contributors

v0.13

Performance optimizations

New functionality

Usability improvements

Thanks to the contributors

v0.12

Performance optimizations

New functionality

API deprecations and breaking changes

Thanks to the contributors

v0.11

Performance optimizations

New functionality

Usability improvements

Breaking changes to the API

Thanks to the contributors

v0.10

Performance optimizations

New functionality

Usability improvements

Deprecated functionality

Thanks to the contributors

v0.9

Performance optimizations

New functionality

Usability improvements

Thanks to the contributors

v0.7

v0.5

v0.3

v0.2