Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AES hash is significantly slower than fallback for short strings on Broadwell #66

Closed
as-com opened this issue Jan 10, 2021 · 12 comments
Closed

Comments

@as-com
Copy link

as-com commented Jan 10, 2021

Tested on a Broadwell Xeon E5-2690 v4 with Rust Nightly (1.51, 2020-01-09):

  • "1": 3.07 ns vs. 1.90 ns
  • "123": 3.00 ns vs. 2.01 ns
  • "1234": 3.00 ns vs. 2.11 ns
  • "1234567": 2.99 ns vs 2.10 ns
  • "12345678": 2.05 ns vs. 2.09 ns

This performance difference is very noticeable in some macrobenchmarks that involve aHash-powered hashmaps. If this is an inherent limitation of the AES-powered hash, perhaps it would be nice to have a feature flag or some other argument to force the use of the fallback hash if the hashed values are known to be short.

Raw test results
aeshash/u8              time:   [883.26 ps 885.33 ps 887.31 ps]                        

aeshash/u16             time:   [848.89 ps 852.85 ps 856.76 ps]                         
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

aeshash/u32             time:   [837.57 ps 841.42 ps 845.60 ps]                         
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

aeshash/u64             time:   [844.28 ps 848.31 ps 852.62 ps]                         
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

aeshash/u128            time:   [634.65 ps 637.45 ps 640.59 ps]                          
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

aeshash/string/"1"      time:   [3.0568 ns 3.0707 ns 3.0857 ns]                                
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
aeshash/string/"123"    time:   [2.9733 ns 3.0039 ns 3.0427 ns]                                  
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
aeshash/string/"1234"   time:   [2.9937 ns 3.0096 ns 3.0261 ns]                                   
aeshash/string/"1234567"                                                                             
                        time:   [2.9739 ns 2.9858 ns 2.9995 ns]
aeshash/string/"12345678"                                                                             
                        time:   [2.0422 ns 2.0526 ns 2.0634 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
aeshash/string/"123456789012345"                                                                             
                        time:   [2.1141 ns 2.1215 ns 2.1289 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
aeshash/string/"1234567890123456"                                                                             
                        time:   [2.0369 ns 2.0457 ns 2.0556 ns]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
aeshash/string/"123456789012345678901234"                                                                             
                        time:   [2.2794 ns 2.2919 ns 2.3055 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
aeshash/string/"123456789012345678901234567890123"                                                                             
                        time:   [3.6343 ns 3.6497 ns 3.6677 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
aeshash/string/"12345678901234567890123456789012345678901234567890123456789012345678"                                                                             
                        time:   [8.6159 ns 8.6649 ns 8.7177 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234...                                                                             
                        time:   [11.947 ns 12.029 ns 12.107 ns]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) low mild
  1 (1.00%) high severe
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234... #2                                                                             
                        time:   [44.972 ns 45.239 ns 45.515 ns]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
fallback/u8             time:   [888.06 ps 889.87 ps 891.68 ps]                         
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

fallback/u16            time:   [881.25 ps 884.31 ps 888.02 ps]                          
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high severe

fallback/u32            time:   [888.11 ps 891.69 ps 895.86 ps]                          
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

fallback/u64            time:   [881.68 ps 883.99 ps 886.34 ps]                          
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

fallback/u128           time:   [681.99 ps 683.29 ps 684.65 ps]                           
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

fallback/string/"1"     time:   [1.9006 ns 1.9042 ns 1.9079 ns]                                 
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
fallback/string/"123"   time:   [2.0054 ns 2.0109 ns 2.0163 ns]                                   
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
fallback/string/"1234"  time:   [2.0983 ns 2.1073 ns 2.1166 ns]                                    
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
fallback/string/"1234567"                                                                             
                        time:   [2.0951 ns 2.1031 ns 2.1110 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
fallback/string/"12345678"                                                                             
                        time:   [2.0800 ns 2.0892 ns 2.0982 ns]
fallback/string/"123456789012345"                                                                             
                        time:   [2.3176 ns 2.3222 ns 2.3268 ns]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
fallback/string/"1234567890123456"                                                                             
                        time:   [2.3022 ns 2.3065 ns 2.3108 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
fallback/string/"123456789012345678901234"                                                                             
                        time:   [3.5435 ns 3.5927 ns 3.6562 ns]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe
fallback/string/"123456789012345678901234567890123"                                                                             
                        time:   [4.8958 ns 4.9083 ns 4.9210 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678"                                                                             
                        time:   [7.5150 ns 7.5410 ns 7.5667 ns]
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123...                                                                             
                        time:   [12.932 ns 12.951 ns 12.972 ns]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123... #2                                                                            
                        time:   [98.567 ns 98.730 ns 98.885 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
@tkaitchuck
Copy link
Owner

Yes, in intel systems there was a major optimization introduced in skylake which reduced the latency of the the AES instruction from 7 to 4 cycles. See: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=aes&expand=227

It is worth noting that it affected the latency but not the throughput, so if instructions get properly aligned and pipelined (assuming there is other work to do) the delay should not be an issue. If its showing up in macrobenchmarks that obviously isn't happening.

From the raw results above the strings less than 8 characters are slower than those which are longer. I've seen that before and it came down to code alignment. It is possible to force alignments, and I did some experimenting with that but couldn't find anything that consistently gave better results than letting the compiler figure it out.

But that's not likely the same issue you are seeing in the macro benchmarks. (Or at least not entirely). It is possible to switch the algorithm based on the type in nightly, using specialization. However it is not possible to dispatch based on the size. So I cannot change which algorithm is used, but can only change the update function within the algorithm. There is already a code path for < 8 byte strings, in the aes variant. It should be possible to replace this with an alternate update function, but it's not as straightforward as copying the code from the fallback because it is designed to work with a 64bit state not a 128bit one, and order of the update is different.

I have to think more about how to deal with this. If you have any ideas, let me know.

@tkaitchuck
Copy link
Owner

I thought of a way to reduce it to 3 aes rounds instead of 4. If we want to go beyond that we might need more information.
@as-com You mentioned "cases where the values are known to be short", are they also known to be of fixed length? Because I could make it a LOT faster if I knew the exact length.
Also, I never asked, Are you OK with a nightly only solution?

@as-com
Copy link
Author

as-com commented Jan 11, 2021

The macrobenchmark involves handling JSON documents with keys that are of variable lengths using IndexMap, but based on the performance numbers (10-20% performance decrease with AES enabled), and based on examination of the documents, I would presume that the key lengths skew a lot shorter.

A nightly only solution would be fine for my use-case, but probably not for most people using this library.

@tkaitchuck
Copy link
Owner

@as-com Can you run a test with the short-string branch and let me know how that performs?

@as-com
Copy link
Author

as-com commented Jan 15, 2021

Testing on the same machine with Rust Nightly (1.51, 2021-01-14), the performance appears to be significantly improved, but still slightly slower than fallback:

aeshash/u8              time:   [838.63 ps 843.78 ps 849.01 ps]

aeshash/u16             time:   [859.63 ps 863.63 ps 867.29 ps]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

aeshash/u32             time:   [859.04 ps 862.68 ps 866.58 ps]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

aeshash/u64             time:   [879.67 ps 885.40 ps 892.14 ps]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

aeshash/u128            time:   [673.11 ps 675.10 ps 677.13 ps]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

aeshash/string/"1"      time:   [1.8148 ns 1.8236 ns 1.8322 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
aeshash/string/"123"    time:   [1.8979 ns 1.9030 ns 1.9082 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe
aeshash/string/"1234"   time:   [1.8842 ns 1.8959 ns 1.9090 ns]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
aeshash/string/"1234567"
                        time:   [1.8842 ns 1.8890 ns 1.8940 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
aeshash/string/"12345678"
                        time:   [1.8769 ns 1.8836 ns 1.8901 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
aeshash/string/"123456789012345"
                        time:   [2.2297 ns 2.2353 ns 2.2407 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
aeshash/string/"1234567890123456"
                        time:   [2.2013 ns 2.2107 ns 2.2207 ns]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe
aeshash/string/"123456789012345678901234"
                        time:   [2.4521 ns 2.4714 ns 2.4917 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
aeshash/string/"123456789012345678901234567890123"
                        time:   [3.8210 ns 3.8404 ns 3.8598 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
aeshash/string/"12345678901234567890123456789012345678901234567890123456789012345678"
                        time:   [8.8334 ns 8.9073 ns 9.0093 ns]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234...
                        time:   [12.059 ns 12.151 ns 12.244 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234... #2
                        time:   [47.253 ns 47.345 ns 47.438 ns]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
fallback/u8             time:   [877.39 ps 879.56 ps 881.71 ps]
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) low mild
  3 (3.00%) high mild

fallback/u16            time:   [859.76 ps 863.24 ps 866.81 ps]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

fallback/u32            time:   [852.12 ps 854.74 ps 857.55 ps]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

fallback/u64            time:   [848.59 ps 851.24 ps 853.99 ps]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

fallback/u128           time:   [652.45 ps 654.71 ps 657.22 ps]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

fallback/string/"1"     time:   [1.5380 ns 1.5453 ns 1.5529 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
fallback/string/"123"   time:   [1.6662 ns 1.6714 ns 1.6774 ns]
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
fallback/string/"1234"  time:   [1.6522 ns 1.6641 ns 1.6777 ns]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
fallback/string/"1234567"
                        time:   [1.6455 ns 1.6512 ns 1.6574 ns]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
fallback/string/"12345678"
                        time:   [1.6974 ns 1.7238 ns 1.7596 ns]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
fallback/string/"123456789012345"
                        time:   [1.9137 ns 1.9221 ns 1.9307 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
fallback/string/"1234567890123456"
                        time:   [1.9110 ns 1.9195 ns 1.9281 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
fallback/string/"123456789012345678901234"
                        time:   [3.0018 ns 3.0198 ns 3.0391 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
fallback/string/"123456789012345678901234567890123"
                        time:   [4.1376 ns 4.1557 ns 4.1749 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678"
                        time:   [6.4612 ns 6.4912 ns 6.5258 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123...
                        time:   [10.594 ns 10.627 ns 10.665 ns]
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  2 (2.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123... #2
                        time:   [71.164 ns 71.403 ns 71.668 ns]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

In the JSON-handling macrobenchmark, performance appears to be mostly unchanged compared to the master branch with AES enabled (i.e. still slower by 10% or so).

I suspect there is some weirdness going on with AES instructions stalling something, or something else.

@tkaitchuck
Copy link
Owner

Well, there is also the size of the state. The AES version needs to create keys which consist of 3 128 bit values. The fallback only needs 4 64 bit values. So the difference in instantiation of the hasher may also be different.

@tkaitchuck
Copy link
Owner

I had to restructure the way specialization worked, and the approach I had earlier won't work.
I have pushed an update to the branch, which on my computer brings them to parity. I'm guessing there still might be gap on Broadwell. I am not sure if there is a way to improve it further.

@as-com Let me know how this preforms for you.

@as-com
Copy link
Author

as-com commented Jan 23, 2021

On Rust Nightly (1.51 2021-01-22), performance is either on-par or improved in the benchmark:

aeshash/u8              time:   [879.92 ps 889.53 ps 902.41 ps]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

aeshash/u16             time:   [841.33 ps 855.18 ps 871.20 ps]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

aeshash/u32             time:   [837.23 ps 845.40 ps 854.08 ps]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

aeshash/u64             time:   [839.42 ps 846.51 ps 853.51 ps]

aeshash/u128            time:   [640.35 ps 646.60 ps 653.21 ps]
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

aeshash/string/"1"      time:   [2.1187 ns 2.1650 ns 2.2251 ns]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe
aeshash/string/"123"    time:   [2.0345 ns 2.0509 ns 2.0680 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
aeshash/string/"1234"   time:   [2.0101 ns 2.0278 ns 2.0466 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
aeshash/string/"1234567"
                        time:   [1.9864 ns 1.9976 ns 2.0096 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
aeshash/string/"12345678"
                        time:   [1.9647 ns 1.9807 ns 1.9974 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
aeshash/string/"123456789012345"
                        time:   [1.9440 ns 1.9554 ns 1.9676 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
aeshash/string/"1234567890123456"
                        time:   [1.9643 ns 1.9782 ns 1.9924 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
aeshash/string/"123456789012345678901234"
                        time:   [2.2632 ns 2.2803 ns 2.2979 ns]
aeshash/string/"123456789012345678901234567890123"
                        time:   [3.6303 ns 3.6548 ns 3.6815 ns]
aeshash/string/"12345678901234567890123456789012345678901234567890123456789012345678"
                        time:   [8.8100 ns 8.8864 ns 8.9678 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234...
                        time:   [11.733 ns 11.808 ns 11.889 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
aeshash/string/"123456789012345678901234567890123456789012345678901234567890123456789012345678901234... #2
                        time:   [46.283 ns 46.689 ns 47.089 ns]
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  9 (9.00%) high mild

     Running target/release/deps/map-603ff0ff8955ff4d
aes_words               time:   [5.8660 ms 6.0004 ms 6.1419 ms]
Found 13 outliers among 100 measurements (13.00%)
  13 (13.00%) high mild
fallback/u8             time:   [830.33 ps 836.74 ps 843.80 ps]

fallback/u16            time:   [824.05 ps 829.57 ps 835.45 ps]

fallback/u32            time:   [830.78 ps 836.95 ps 844.24 ps]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

fallback/u64            time:   [846.38 ps 852.69 ps 859.27 ps]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

fallback/u128           time:   [639.81 ps 644.63 ps 649.74 ps]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

fallback/string/"1"     time:   [2.0946 ns 2.1116 ns 2.1292 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
fallback/string/"123"   time:   [1.9612 ns 1.9743 ns 1.9881 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
fallback/string/"1234"  time:   [2.0912 ns 2.1086 ns 2.1261 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
fallback/string/"1234567"
                        time:   [2.0701 ns 2.0833 ns 2.0978 ns]
fallback/string/"12345678"
                        time:   [2.0771 ns 2.0920 ns 2.1085 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
fallback/string/"123456789012345"
                        time:   [2.0916 ns 2.1102 ns 2.1296 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
fallback/string/"1234567890123456"
                        time:   [2.0967 ns 2.1132 ns 2.1310 ns]
fallback/string/"123456789012345678901234"
                        time:   [3.9467 ns 3.9883 ns 4.0304 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
fallback/string/"123456789012345678901234567890123"
                        time:   [4.3666 ns 4.3950 ns 4.4253 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678"
                        time:   [6.6940 ns 6.7876 ns 6.8927 ns]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123...
                        time:   [10.969 ns 11.039 ns 11.115 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
fallback/string/"12345678901234567890123456789012345678901234567890123456789012345678901234567890123... #2
                        time:   [73.150 ns 73.804 ns 74.545 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

In the JSON macrobenchmark, performance appears to be unchanged compared to the master branch. Disabling AES support appears to cause about a 6% increase in performance. Seems the larger state of the AES hash is the real problem here.

tkaitchuck added a commit that referenced this issue Jan 26, 2021
This allows strings less than 8 bytes to be special cased to avoid the AES call as mentioned in #66

Signed-off-by: Tom Kaitchuck <Tom.Kaitchuck@gmail.com>
@tkaitchuck
Copy link
Owner

tkaitchuck commented Jan 27, 2021

@as-com can you try the json benchmark with HashBrown 0.10.0 on nightly? (I know it's yanked and am attempting to sort that out)

@as-com
Copy link
Author

as-com commented Jan 31, 2021

Running the JSON benchmark with hashbrown's master branch (feature nightly enabled), aHash 0.7, and Rust Nightly (1.5.1 2021-01-30), the performance with AES is 135.15 ops/sec, and performance without AES 147.04 ops/sec.

Compared to hashbrown 0.9.1 with aHash 0.7, performance is unchanged.

@as-com
Copy link
Author

as-com commented Mar 24, 2021

Note to self: run tests again in light of rust-lang/rust#83027 and rust-lang/rust#83084

@as-com
Copy link
Author

as-com commented Mar 25, 2021

Update: the performance regression from using -C target-cpu=native to enable AES support on Broadwell appears to have disappeared on the latest Rust Nightly, and performance compared to disabling AES support is improved by a few percentage points. I'll consider this issue resolved.

@as-com as-com closed this as completed Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants