[ matrix_transpose/bugfix ] Prevent reading/saving data from/to unallocated memory #2698

skykongkong8 · 2024-08-06T08:13:51Z

Previous transpose kernel occasionally load/save unallocated memory, and then masked it.
Now, it does not read them at the first place, but load with for-loop
This would deteriorate speed of fp16 matrix transpose, but won't be dominant in total model latency

dim	before	after
87,2049	884 ns	16114 ns
2048, 86	34019 ns	82258 ns

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

taos-ci · 2024-08-06T08:13:54Z

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2698. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

…ocated memory - Previous transpose kernel occasionally load/save unallocated memory, and then masked it. - Now, it does not read them at the first place, but load with for-loop - This would deteriorate speed of fp16 matrix transpose, but won't be dominant in total model latency **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <ss.kong@samsung.com>

lhs8928 · 2024-08-06T08:35:36Z

nntrainer/tensor/matrix_transpose_neon/matrix_transpose_kernels_neon.h

+    if (N == 4) {
+      input[i] = vld1_f16(&src[i * ld_src]);
+    } else {
+      float16x4_t tmp = ZEROS;


AFAIK attempting to read the unallocated memory only happens if the i == M - 1.
Please let me know if I'm wrong.

Your understanding is correct, but M inside of this kernel does not mean global M.
M here ranges to 1~8 (local row size).
And as you can see below, it will affect for the leftover local M only, since otherwise it will fall into fixed size kernels, or leftover kernels with template param M = 4 or M = 8

... // if (N % 8 > 0 && N % 8 < 4) { transpose_kernel_mxn_neon_128<4>(N - jb, &src[i * ld_src + jb], ld_src, &dst[i + jb * ld_dst], ld_dst); ... // } else { if (jb < N) { transpose_kernel_mxn_neon_256<8>(N - jb, &src[ib * ld_src + jb], ld_src, &dst[ib + jb * ld_dst], ld_dst); } ...

And fyi, I think this won't harm the total GEMM latency, because 82258 ns is 0.082258 ms,
GEMM computation using similar dimension would range from 3 ~ 4 ms, which is quite trivial

taos-ci

@skykongkong8, 💯 All CI checkers are successfully verified. Thanks.

jijoongmoon

LGTM

skykongkong8 requested review from myungjoo, jijoongmoon, again4you, jaeyun-jung, leemgs, wooksong, helloahn, kparichay, gichan-jang, anyj0527, zhoonit, lhs8928, songgot, jihochu, DonghakPark, SeoHyungjun, baek2sm, djeong20, EunjuYang and a team as code owners August 6, 2024 08:13

github-actions bot added the Need Review label Aug 6, 2024

skykongkong8 force-pushed the pr/transpose/biq16 branch 2 times, most recently from 79f34fe to de8b4da Compare August 6, 2024 08:27

skykongkong8 force-pushed the pr/transpose/biq16 branch from de8b4da to f7b7464 Compare August 6, 2024 08:31

lhs8928 reviewed Aug 6, 2024

View reviewed changes

taos-ci approved these changes Aug 6, 2024

View reviewed changes

jijoongmoon approved these changes Aug 7, 2024

View reviewed changes

jijoongmoon merged commit aa1bddf into nnstreamer:main Aug 9, 2024
41 checks passed

skykongkong8 deleted the pr/transpose/biq16 branch August 16, 2024 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ matrix_transpose/bugfix ] Prevent reading/saving data from/to unallocated memory #2698

[ matrix_transpose/bugfix ] Prevent reading/saving data from/to unallocated memory #2698

skykongkong8 commented Aug 6, 2024

taos-ci commented Aug 6, 2024

lhs8928 Aug 6, 2024

skykongkong8 Aug 6, 2024 •

edited

Loading

skykongkong8 Aug 6, 2024 •

edited

Loading

taos-ci left a comment

jijoongmoon left a comment

[ matrix_transpose/bugfix ] Prevent reading/saving data from/to unallocated memory #2698

[ matrix_transpose/bugfix ] Prevent reading/saving data from/to unallocated memory #2698

Conversation

skykongkong8 commented Aug 6, 2024

taos-ci commented Aug 6, 2024

lhs8928 Aug 6, 2024

Choose a reason for hiding this comment

skykongkong8 Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

skykongkong8 Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

taos-ci left a comment

Choose a reason for hiding this comment

jijoongmoon left a comment

Choose a reason for hiding this comment

skykongkong8 Aug 6, 2024 •

edited

Loading

skykongkong8 Aug 6, 2024 •

edited

Loading