Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kyber ASM ARMv7E-M: added assembly code #7706

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SparkiDev
Copy link
Contributor

Description

Improved performance by reworking kyber_ntt, kyber_invtt, kyber_basemul_mont, kyber_basemul_mont_add to be in assembly.

Testing

./configure '--disable-shared' '--enable-experimental' '--enable-kyber' '--enable-cryptonly' '--disable-rsa' '--disable-dh' '--disable-ecc' 'LDFLAGS=--static' '--host=armv7m' 'CC=arm-linux-gnueabi-gcc' '--enable-armasm'

Checklist

  • added tests
  • updated/added doxygen
  • updated appropriate READMEs
  • Updated manual and documentation

Improved performance by reworking kyber_ntt, kyber_invtt,
kyber_basemul_mont, kyber_basemul_mont_add to be in assembly.
@SparkiDev SparkiDev self-assigned this Jul 3, 2024
@dgarske
Copy link
Contributor

dgarske commented Jul 3, 2024

Tested on STM32H7A3ZI at 240MHz (Cortex M7)

Using:

#define WOLFSSL_EXPERIMENTAL_SETTINGS

#define WOLFSSL_SHA3
#define WOLFSSL_SHAKE128
#define WOLFSSL_SHAKE256

#define WOLFSSL_HAVE_KYBER
#define WOLFSSL_WC_KYBER
//#define WOLFSSL_KYBER_SMALL

#define WOLFSSL_ARMASM
#define WOLFSSL_ARMASM_INLINE
#define WOLFSSL_ARMASM_NO_HW_CRYPTO
#define WOLFSSL_ARMASM_NO_NEON
#define WOLFSSL_ARMASM_CRYPTO_SHA3
#define WOLFSSL_ARM_ARCH 7

Current Master (before this PR):

RNG                        975 KiB took 1.024 seconds,  952.148 KiB/s
SHA-256                      3 MiB took 1.004 seconds,    3.088 MiB/s
SHA3-224                     1 MiB took 1.012 seconds,    1.399 MiB/s
SHA3-256                     1 MiB took 1.016 seconds,    1.322 MiB/s
SHA3-384                     1 MiB took 1.000 seconds,    1.025 MiB/s
SHA3-512                   750 KiB took 1.016 seconds,  738.189 KiB/s
SHAKE128                     2 MiB took 1.004 seconds,    1.605 MiB/s
SHAKE256                     1 MiB took 1.015 seconds,    1.323 MiB/s
KYBER512    128  key gen       220 ops took 1.008 sec, avg 4.582 ms, 218.254 ops/sec
KYBER512    128    encap       202 ops took 1.000 sec, avg 4.950 ms, 202.000 ops/sec
KYBER512    128    decap       182 ops took 1.000 sec, avg 5.495 ms, 182.000 ops/sec
KYBER768    192  key gen       142 ops took 1.011 sec, avg 7.120 ms, 140.455 ops/sec
KYBER768    192    encap       124 ops took 1.000 sec, avg 8.065 ms, 124.000 ops/sec
KYBER768    192    decap       114 ops took 1.008 sec, avg 8.842 ms, 113.095 ops/sec
KYBER1024   256  key gen        92 ops took 1.011 sec, avg 10.989 ms, 90.999 ops/sec
KYBER1024   256    encap        82 ops took 1.012 sec, avg 12.341 ms, 81.028 ops/sec
KYBER1024   256    decap        76 ops took 1.016 sec, avg 13.368 ms, 74.803 ops/sec

With PR 7706:

RNG                        975 KiB took 1.016 seconds,  959.646 KiB/s
SHA-256                      3 MiB took 1.004 seconds,    2.967 MiB/s
SHA3-224                     1 MiB took 1.015 seconds,    1.395 MiB/s
SHA3-256                     1 MiB took 1.000 seconds,    1.318 MiB/s
SHA3-384                     1 MiB took 1.004 seconds,    1.021 MiB/s
SHA3-512                   750 KiB took 1.019 seconds,  736.016 KiB/s
SHAKE128                     2 MiB took 1.008 seconds,    1.599 MiB/s
SHAKE256                     1 MiB took 1.004 seconds,    1.313 MiB/s
KYBER512    128  key gen       238 ops took 1.000 sec, avg 4.202 ms, 238.000 ops/sec
KYBER512    128    encap       226 ops took 1.004 sec, avg 4.442 ms, 225.100 ops/sec
KYBER512    128    decap       212 ops took 1.000 sec, avg 4.717 ms, 212.000 ops/sec
KYBER768    192  key gen       156 ops took 1.008 sec, avg 6.462 ms, 154.762 ops/sec
KYBER768    192    encap       140 ops took 1.012 sec, avg 7.229 ms, 138.340 ops/sec
KYBER768    192    decap       132 ops took 1.007 sec, avg 7.629 ms, 131.082 ops/sec
KYBER1024   256  key gen       102 ops took 1.016 sec, avg 9.961 ms, 100.394 ops/sec
KYBER1024   256    encap        90 ops took 1.000 sec, avg 11.111 ms, 90.000 ops/sec
KYBER1024   256    decap        86 ops took 1.000 sec, avg 11.628 ms, 86.000 ops/sec

Note benchmark won't run Kyber without -kyber or with this patch:

diff --git a/wolfcrypt/benchmark/benchmark.c b/wolfcrypt/benchmark/benchmark.c
index 964f9ebd0..1082de63c 100644
--- a/wolfcrypt/benchmark/benchmark.c
+++ b/wolfcrypt/benchmark/benchmark.c
@@ -3593,17 +3593,17 @@ static void* benchmarks_do(void* args)
 #ifdef WOLFSSL_HAVE_KYBER
     if (bench_all || (bench_pq_asym_algs & BENCH_KYBER)) {
     #ifdef WOLFSSL_KYBER512
-        if (bench_pq_asym_algs & BENCH_KYBER512) {
+        if (bench_all || (bench_pq_asym_algs & BENCH_KYBER512)) {
             bench_kyber(KYBER512);
         }
     #endif
     #ifdef WOLFSSL_KYBER768
-        if (bench_pq_asym_algs & BENCH_KYBER768) {
+        if (bench_all || (bench_pq_asym_algs & BENCH_KYBER768)) {
             bench_kyber(KYBER768);
         }
     #endif
     #ifdef WOLFSSL_KYBER1024
-        if (bench_pq_asym_algs & BENCH_KYBER1024) {
+        if (bench_all || (bench_pq_asym_algs & BENCH_KYBER1024)) {
             bench_kyber(KYBER1024);
         }
     #endif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants