[Wait for #2615] Enable Mixed Precision Training in NNTrainer @open sesame 07/04 21:44 #2663

jijoongmoon · 2024-07-03T00:59:21Z

In this PR

This PR finalizes the mixed precision support in NNTrainer.
It modifies the network grap and layer node, and layer implementations. However, it does not support mixed precision training for all the layers.
It supports,

Layers
: input, fully_connected, activation, dropout, multiout, concat, lstm, reshape, premute, conv1d, conv2d, addition, batch normalization
Loss
: mse
Optimizer
: adam

and definitely, enabling mixed training to other operations should follow.

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon jijoong.moon@samsung.com

Commits to be reviewed in this PR

[ Model ] Fix the gradient clipping for the FP16 or Low bit Gradient

In this PR, when we compute the l2norm of gradient tensor, it converts to full precsion and computes the l2norm for gradient clipping.

[ Layer ] Add mu and var backup up tensor.

This PR add the mu and var backup tensor ( mu_b, var_b ) to restore the previous moving mean and moving variance for mixed precsion training.

[ Layer ] prevent randomize when it restore the data

In order to resotore previous iteration data, this pr disable randomnization of mask if it need restore previous data.

[ Context ] add check if it needs restore previous data

This PR enable the check if it need restore previous data. By doing
this, we can remove the NaN or Inf data in Tensor for the mixed
precsion training.

[ Tensor ] remove sscal to set zero.

We do need to remove the Nan or Inf value in Tensor by call setZero(). However, if we using sscal, then Nan or Inf values are remain still. This PR change the sscal to memset.

[ Mixed ] set initialize gradient in layers and bugfixes

This pr fixes some bugs when it runs as Mixed Precision Training

[ Mixed Training ] add is_mixed variable in weight

Adding is_mixed variable to check if it is mixed precision training. It means that weight type of model is not full precision.

[ BUG FIX ] Fix bug for mixed precision

For the mixed precision computation of bn layer, there is bug relate with f32 computation. Also Adam update has bug too.

[TEST] using builddir/android_build_result to build test

This PR includes changes in Android.mk to use
builddir/android_build_result. In order to use, soft link of
android_build_reuslt dir is necessary in upper dir (../)

ln -s ../../buildir/android_build_result ../nntrainer

taos-ci · 2024-07-03T00:59:23Z

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2663. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

taos-ci · 2024-07-03T01:32:30Z

cibot: @jijoongmoon, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2663-202407030959240.31418895721436-8fc0c70f550dfc949b3c4f78ce925213b9ef0a3b/.

taos-ci · 2024-07-03T02:24:24Z

cibot: @jijoongmoon, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2663-202407031051270.26607799530029-f6ad00ff276a56a2016757221092cd4315f19580/.

taos-ci

@jijoongmoon, 💯 All CI checkers are successfully verified. Thanks.

taos-ci

@jijoongmoon, 💯 All CI checkers are successfully verified. Thanks.

This PR add loss scale parameter in runcontext and use it to update mse loss. . Add Loss Scale Parameter in RunLayerContext Constructor . Add applyLossScale func to update return derivitive in Loss Layer . Change MSE Loss Layer to apply the loss scale to return derivitive **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

This PR enables the Mixed Precision Training. For now only FP16-FP32 is considered. Additional Test cases will be added. . add getSortedLayerIdx to set the graph order for fowarding. . change clip_weights to lazy_apply_weights to use both cases. . add fowarding_op to run forwarding from that layer which has a gradient with nan. . add while loop for re-run backwarding after reset the loss scale. . add setLossScale in RunLayerContext . add check the gradient if mixed precsion enable. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

This PR add inifinity value check in Tensor data. . rename the hasNaN to isValid . add infinity check in isValid Function and now it check NaN and Inf . modify to check the blas_avx and blas_neon . modify graph and model check is_valid rather than has_nan . add unittest of isValid Function **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

This PR chage the loss computation using full precsion rather than half precsion to maintain accuracy. **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

This PR enables the Mixed Precsion Unittest with Torch Model. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

This PR add torch mixed precsion golden data generation and input and output for test. . some fixes to test. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

This PR includes more unittest and fixes for mixed precsion. . Model Unittest . 2 fc layer which generate NaN or Inf Gradient from Troch. . MSE Loss and Check whole procedure of the mixed precsion training. . Even if the FC model only have one weight, but it is good enough to validate the mixed precsion. . Torch model also work similar way of NNTrainer. . Some fixes about the exeuction order of apply gradient when the mixed precision is on. . Update SGD to support Mixed Precision training **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

This PR update the conv2D Layer to support Mixed Precision (FP16). It is based on the PR nnstreamer#2579 Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

This commit enables mixed precision support for LSTM Layer. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

This PR add Execution Mode parameter when we compile. The default is ml::train::ExeuctionMode::TRAIN. Currently we do not support compiler optimization for inference mode such as batch normalization fusing, etc. But we will add more optimization depending on the exeuction mode. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

This PR includes Mixed Precision support for batch normalization layer. When the training, BN layer should run full precsion with FP16 Weight data. Therefore, Reading the FP16 data read and data coversion of the current Weight and Activation are required. For the Inference, we do need compiler optimization like bn fusing. So it also includes execution mode parameters for compile. Because of compilcate data conversion of bn layer, test case generation also needs to update, so that taking the fp16 input,output tensors and weights and converting FP32 weight for computation. For veification, we do need convert FP32 to FP16. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

enable mixed precision on reshape layer - reshape layer only change dim, so change dimensions and check datatype **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <donghak.park@samsung.com>

Enable Mixed precision on Pooling 2D Layer - I modified it to properly cast for the case of FP16 so that the mixed precision function can be activated on the existing pooling 2d layer. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <donghak.park@samsung.com>

In this PR, when we compute the l2norm of gradient tensor, it converts to full precsion and computes the l2norm for gradient clipping. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

This PR add the mu and var backup tensor ( mu_b, var_b ) to restore the previous moving mean and moving variance for mixed precsion training. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

In order to resotore previous iteration data, this pr disable randomnization of mask if it need restore previous data. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

This PR enable the check if it need restore previous data. By doing this, we can remove the NaN or Inf data in Tensor for the mixed precsion training. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

We do need to remove the Nan or Inf value in Tensor by call setZero(). However, if we using sscal, then Nan or Inf values are remain still. This PR change the sscal to memset. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

This pr fixes some bugs when it runs as Mixed Precision Training **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

Adding is_mixed variable to check if it is mixed precision training. It means that weight type of model is not full precision. **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

For the mixed precision computation of bn layer, there is bug relate with f32 computation. Also Adam update has bug too. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

- Using unaligned memory may invoke SIGSEGV **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <ss.kong@samsung.com>

This PR includes changes in Android.mk to use builddir/android_build_result. In order to use, soft link of android_build_reuslt dir is necessary in upper dir (../) ln -s ../../buildir/android_build_result ../nntrainer Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

taos-ci · 2024-08-28T01:34:41Z

cibot: @jijoongmoon, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2663-202408281029580.59508800506592-bb6a1ab9425b5ebeb7bc30a30b4dbee527f5d24b/.

This PR includes fixes to use TensorV2 Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

taos-ci

@jijoongmoon, 💯 All CI checkers are successfully verified. Thanks.

jijoongmoon requested review from myungjoo, again4you, jaeyun-jung, leemgs, wooksong, helloahn, kparichay, gichan-jang, anyj0527, zhoonit, lhs8928, songgot, jihochu, DonghakPark, SeoHyungjun and baek2sm as code owners July 3, 2024 00:59

jijoongmoon requested review from skykongkong8, djeong20, EunjuYang and a team as code owners July 3, 2024 00:59

jijoongmoon added the DO NOT MERGE label Jul 3, 2024

github-actions bot added the Need Review label Jul 3, 2024

jijoongmoon force-pushed the mixed_public branch from 8fc0c70 to f6ad00f Compare July 3, 2024 01:51

jijoongmoon force-pushed the mixed_public branch from f6ad00f to 247db74 Compare July 4, 2024 05:15

taos-ci approved these changes Jul 4, 2024

View reviewed changes

jijoongmoon changed the title ~~[Wait for #2615] Enable Mixed Precision Training in NNTrainer~~ [Wait for #2615] Enable Mixed Precision Training in NNTrainer @open sesame 07/04 21:44 Jul 4, 2024

taos-ci approved these changes Jul 4, 2024

View reviewed changes

jijoongmoon and others added 23 commits August 28, 2024 10:29

[ TEST ] Add Torch Mixed Precision Model Test

f46c652

This PR enables the Mixed Precsion Unittest with Torch Model. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[ Layer ] enable Mixed Precision in LSTM Layer

d3caa7c

This commit enables mixed precision support for LSTM Layer. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[ hgemm ] Use aligned memory allocation in transpose / padding gemm

e2685c6

- Using unaligned memory may invoke SIGSEGV **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <ss.kong@samsung.com>

jijoongmoon force-pushed the mixed_public branch from a68f133 to bb6a1ab Compare August 28, 2024 01:29

jijoongmoon force-pushed the mixed_public branch 2 times, most recently from 60d8f3d to a0c3029 Compare August 28, 2024 04:45

[Mixed Precision] Fix mixed precsion to use Tensor V2

6856df5

This PR includes fixes to use TensorV2 Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

jijoongmoon force-pushed the mixed_public branch from a0c3029 to 6856df5 Compare August 28, 2024 04:52

taos-ci approved these changes Aug 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Wait for #2615] Enable Mixed Precision Training in NNTrainer @open sesame 07/04 21:44 #2663

[Wait for #2615] Enable Mixed Precision Training in NNTrainer @open sesame 07/04 21:44 #2663

jijoongmoon commented Jul 3, 2024

taos-ci commented Jul 3, 2024

taos-ci commented Jul 3, 2024

taos-ci commented Jul 3, 2024

taos-ci left a comment

taos-ci left a comment

taos-ci commented Aug 28, 2024

taos-ci left a comment

[Wait for #2615] Enable Mixed Precision Training in NNTrainer @open sesame 07/04 21:44 #2663

Are you sure you want to change the base?

[Wait for #2615] Enable Mixed Precision Training in NNTrainer @open sesame 07/04 21:44 #2663

Conversation

jijoongmoon commented Jul 3, 2024

In this PR

Commits to be reviewed in this PR

taos-ci commented Jul 3, 2024

taos-ci commented Jul 3, 2024

taos-ci commented Jul 3, 2024

taos-ci left a comment

Choose a reason for hiding this comment

taos-ci left a comment

Choose a reason for hiding this comment

taos-ci commented Aug 28, 2024

taos-ci left a comment

Choose a reason for hiding this comment