Keccak implementations discarting benches #948

CPerezz · 2022-11-29T10:06:42Z

In this PR we will run the three actual implementations of keccak with the ChallengeAPI already placed in.

With the results (which will be the SuperCircuit benchmarks where keccak is needed) we will chose only 1 implementation to keep and will remove the rest.

This should resolve: #941 and #943 as also the label to trigger these benchmarks was added.

Brechtpd · 2022-11-29T17:11:34Z

I think ideally the benching the combined super circuit + root (/aggregation circuit) makes sense to find the one that will be the best in practice, but I guess this may be a bit hard to do right now.
Currently we have to be a bit careful because the KECCAK_DEGREE env variable is now used to decide how big the lookup tables will be that are used by the circuits (which will have a big impact on the performance of the circuit because that will decide how many lookups/cell are required in the circuit). These lookup tables grow dynamically based on the height of the circuit (set by DEGREE) but this variable was renamed so the height of the lookup tables don't automatically follow the height of the circuit anymore (not sure why that was done). We need this height while building the circuit, unfortunately the halo2 API doesn't make it easy to get this circuit height there but now that I look at it again I think it shouldn't be too hard to pass it in as a parameter which would be much better. (EDIT: eh now I remember the problem, configure in the Circuit doesn't take self as an argument so no way to pass in a variable like that. Seems like a halo2 API change would be needed to do this unless I miss some other way to do this).

CPerezz · 2022-11-29T21:29:54Z

I think ideally the benching the combined super circuit + root (/aggregation circuit)

I do agree with that. But this type of benchmark just compiled and was run for the first time today thanks to some changes to the zkevm-chain params. So I think for selection purposes, with the SuperCircuit should be enough.

These lookup tables grow dynamically based on the height of the circuit (set by DEGREE) but this variable was renamed so the height of the lookup tables don't automatically follow the height of the circuit anymore (not sure why that was done).

Should we file an issue for that and work on it? If you want, you can take it directly @Brechtpd !

(EDIT: eh now I remember the problem, configure in the Circuit doesn't take self as an argument so no way to pass in a variable like that. Seems like a halo2 API change would be needed to do this unless I miss some other way to do this).

Well, here you can do various things. You can try to fetch an ENV var and if it doesn't give any value, default to something. Would that work?

JIC, I'll run the benches with the three impls since this might bring some light anywhay.

CPerezz · 2022-11-29T21:30:38Z

Results for Multi Packed

Start:   SuperCircuit Proof generation with degree = 19
End:     SuperCircuit Proof generation with degree = 19 ............................2902.707s
Start:   SuperCircuit Proof verification
End:     SuperCircuit Proof verification ...........................................289.750ms
test super_circuit::tests::bench_super_circuit_prover ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 8 filtered out; finished in 3028.61s

Maximum CPU Usage at 100.0%
Maximum Mem Usage at 251.7Gb

Brechtpd · 2022-11-29T21:53:41Z

Should we file an issue for that and work on it? If you want, you can take it directly @Brechtpd !

I don't know the reason for the change so would like to know that first before changing it back.

Well, here you can do various things. You can try to fetch an ENV var and if it doesn't give any value, default to something. Would that work?

Yeah that is what is done now using the KECCAK_DEGREE (which was previously also just DEGREE to automatically map to the same height) and it defaults to a small value (the lookup tables cannot be bigger than the circuit). I believe an env variable is not ideal because the circuit height in the code could of course bet set to anything (not just a value from an env variable). And the best height of the lookup tables is exactly the circuit height itself so should normally always be exactly the same. I think ideally the halo2 Circuit trait is modified so that configure takes a self parameter (just like synthesize does) so that it's possible for a circuit implementation to pass in additional data to the circuit building code. Halo2 kind of assumes that the constraints are independent of the circuit height, but with lookups that is not really true anymore (bigger lookup tables possible -> less lookups need to be generated).

CPerezz · 2022-11-29T22:22:37Z

I don't know the reason for the change so would like to know that first before changing it back.
I'd check with @ed255 or @ChihChengLiang . As I've been away for a couple months. And I'm missing some context on topics like this one.

As for self inside Circuit::configure() I belive @han0110 already discussed that with @str4d and there are some concerns about soundness issues . We could maybe file an issue in halo2 upstream and extend the discussion a bit more. But is a significant change, so I'd not expect to get it landed(in case it lands) soon.

lispc · 2022-11-30T00:14:12Z

I remember even if bit keccak is farest in super circuit, it is slowest in aggregation. Each column in target circuit brings huge overhead in aggregation. Previous we switched to multi_pack not because it is fast itself But it has least columns

lispc · 2022-11-30T00:16:52Z

In the pr to switch to multi pack I and Han already posted many realistic and theory results talking sub circuit and aggregation circuit

han0110 · 2022-11-30T02:50:03Z

As for self inside Circuit::configure() I belive @ han0110 already discussed that with @ str4d and there are some concerns about soundness issues . We could maybe file an issue in halo2 upstream and extend the discussion a bit more. But is a significant change, so I'd not expect to get it landed(in case it lands) soon.

@CPerezz To clarify a little bit I was not in the discussion actually (that happened in Halo 2 Ecosystem discord), I only know it's said we need to be careful about the API change to avoid enable soundness breakage (which makes sense), not directly leading to soundness issue, so perhaps we might also experiment it before upstream if it's really painful to pass many const generics.

CPerezz · 2022-11-30T10:38:53Z

Apologies @han0110 as I missunderstood you.

Do you think is worth running the benches on the aggregation circuit with the keccak impls? Or shall we just use multi-packed directly? @Brechtpd @han0110 @lispc

I know it's configurable as @Brechtpd told me. So I'm sure we can "shape" it to be longer (more rows) and less wide.
In any case, we know for sure AFAIS that Bit and Packed approches aren't good enough. So we might as well just remove them and stick to Multi-packed for now.

Does this work for you all?

CPerezz · 2022-11-30T12:58:05Z

Results for Bit Approach

Start:   SuperCircuit Proof generation with degree = 19
End:     SuperCircuit Proof generation with degree = 19 ............................2439.696s
Start:   SuperCircuit Proof verification
End:     SuperCircuit Proof verification ...........................................891.473ms
test super_circuit::tests::bench_super_circuit_prover ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 8 filtered out; finished in 2520.29s

Maximum CPU Usage at 100.0%
Maximum Mem Usage at 383.9Gb

Much bigger memory consuption while not giving any better performance. It's aprox 500s faster than multi-packed. Which means ~8-9 minutes more.

I think considering the difference with the main time taken to execute all, we should definitely just take keccak_multi_packed and remove the rest. As they don't seem to provide significant advantages.

Brechtpd · 2022-11-30T15:37:11Z

These were some results of the benches I ran on aws (c5.18xlarge). One extra caveat was that I used modified FFT implementation that is up to 3x faster than the one currently in halo2.

	1 core	2 cores	4 cores	8 cores	16 cores
SHA-256 (2^18)	15 hashes/s	26 hashes/s	45 hashes/s	75 hashes/s	116 hashes/s
KECCAK-256 (bit) (2^13)	2.1 hashes/s	3.4 hashes/s	5.7 hashes/s	8.7 hashes/s	11.5 hashes/s
KECCAK-256 (multi-packed) (2^16)	1.1 hashes/s	1.7 hashes/s	2.7 hashes/s	4.3 hashes/s	6.4 hashes/s

Another caveat is that the multi-packed was only run on a small circuit (2^16), and a higher circuit would also be significantly faster because of the reduced number of lookups required so the difference in practical scenario's will no doubt be smaller.

Then of course the other concern is performance of the aggregation circuit. For now multi-packed is the only one that provides enough flexibility to optimize the number of columns. However it is possible to modify the bit implementation to do something similar, it's just that this was harder than I thought with how some things are currently done so I didn't pursue it further. I think the biggest advantage of the bit implementation is that it's significantly simpler compared to the packed lookup-based ones, and could in theory could be a great implementation under something like HyperPlonk.

But I totally agree with how things are now multi-packed is the way to go.

CPerezz · 2022-11-30T17:09:52Z

Results for Packed Approach

running 1 test
Start:   Setup generation with degree = 19
End:     Setup generation with degree = 19 .........................................3.069s
test super_circuit::tests::bench_super_circuit_prover has been running for over 60 seconds
Start:   SuperCircuit Proof generation with degree = 19
Maximum CPU Usage at 81.9%
Maximum Mem Usage at 966.2Gb

This makes clear together with the comments that we should mainly just use multi_packed.
Therefore, I'm removing the other implementations.

CPerezz · 2022-11-30T19:39:17Z

I think based on the results and also, the input from @han0110 , @Brechtpd and @lispc we can all agree that the multi_packed keccak impl is the one we definitely want to keep.

Therefore, I've removed the two others, and I plan to address now with much more priority the work on #933

CPerezz · 2022-11-30T19:49:12Z

@Brechtpd could you also review this? You've been the biggest contributor in this area. So would be nice to count with your approval to do this!! I can't assign you directly.

…ns#948) * feat: Integrate KeccakPackedMulti with Challenge-API * fix: Clippy lints * fix: Pay privacy-scaling-explorations#925 tech debt * remove: Keccak packed implementation * remove: Keccak-bit implementation * change: Export KeccakPackedConfig as KeccakConfig * remove: Benchmarks of discarted keccak impls

…ns#948) * add ec pairing invalid len > 65536 test * refine test * refine * fix: call_data_len can be 4 bytes long

github-actions bot added crate-circuit-benchmarks Issues related to the circuit-benchmarks workspace member crate-zkevm-circuits Issues related to the zkevm-circuits workspace member T-bench Type: benchmark improvements labels Nov 29, 2022

CPerezz added the benchmarks: Super Triggers a Super Circuit benchmark label Nov 29, 2022

CPerezz added benchmarks: Super Triggers a Super Circuit benchmark and removed benchmarks: Super Triggers a Super Circuit benchmark labels Nov 29, 2022

This was referenced Nov 30, 2022

Explore the inclusion of &self as a parameter for Circuit::configure() privacy-scaling-explorations/halo2#106

Closed

Keccak Challenge-API integration #925

Merged

CPerezz added benchmarks: Super Triggers a Super Circuit benchmark and removed benchmarks: Super Triggers a Super Circuit benchmark labels Nov 30, 2022

CPerezz force-pushed the keccak-discard-benches branch from 9bb64a7 to 1abe872 Compare November 30, 2022 11:42

github-actions bot removed crate-circuit-benchmarks Issues related to the circuit-benchmarks workspace member T-bench Type: benchmark improvements labels Nov 30, 2022

CPerezz added benchmarks: Super Triggers a Super Circuit benchmark and removed benchmarks: Super Triggers a Super Circuit benchmark labels Nov 30, 2022

github-actions bot added the crate-circuit-benchmarks Issues related to the circuit-benchmarks workspace member label Nov 30, 2022

github-actions bot added the T-bench Type: benchmark improvements label Nov 30, 2022

CPerezz added 6 commits November 30, 2022 20:27

feat: Integrate KeccakPackedMulti with Challenge-API

efe0ca2

fix: Clippy lints

d3ed1c2

fix: Pay #925 tech debt

cadd12f

remove: Keccak packed implementation

9ea1ff9

remove: Keccak-bit implementation

b1b07ce

change: Export KeccakPackedConfig as KeccakConfig

6a58232

CPerezz force-pushed the keccak-discard-benches branch from 1ddd0f9 to 53f0f54 Compare November 30, 2022 19:36

remove: Benchmarks of discarted keccak impls

4a6b55a

CPerezz force-pushed the keccak-discard-benches branch from 53f0f54 to 4a6b55a Compare November 30, 2022 19:41

CPerezz requested a review from han0110 November 30, 2022 19:48

Brechtpd approved these changes Nov 30, 2022

View reviewed changes

lispc approved these changes Dec 1, 2022

View reviewed changes

CPerezz merged commit 7cd18d1 into main Dec 1, 2022

CPerezz deleted the keccak-discard-benches branch December 1, 2022 08:17

CPerezz mentioned this pull request Dec 13, 2022

Computation of data_rlc and hash_rlc in different ways for keccak #799

Open

naure mentioned this pull request Feb 23, 2023

Remove dependency on KECCAK_DEGREE and KECCAK_ROWS #1241

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keccak implementations discarting benches #948

Keccak implementations discarting benches #948

CPerezz commented Nov 29, 2022

Brechtpd commented Nov 29, 2022 •

edited

Loading

CPerezz commented Nov 29, 2022

CPerezz commented Nov 29, 2022

Brechtpd commented Nov 29, 2022

CPerezz commented Nov 29, 2022

lispc commented Nov 30, 2022

lispc commented Nov 30, 2022

han0110 commented Nov 30, 2022

CPerezz commented Nov 30, 2022

CPerezz commented Nov 30, 2022 •

edited

Loading

Brechtpd commented Nov 30, 2022

CPerezz commented Nov 30, 2022

CPerezz commented Nov 30, 2022

CPerezz commented Nov 30, 2022

Keccak implementations discarting benches #948

Keccak implementations discarting benches #948

Conversation

CPerezz commented Nov 29, 2022

Brechtpd commented Nov 29, 2022 • edited Loading

CPerezz commented Nov 29, 2022

CPerezz commented Nov 29, 2022

Brechtpd commented Nov 29, 2022

CPerezz commented Nov 29, 2022

lispc commented Nov 30, 2022

lispc commented Nov 30, 2022

han0110 commented Nov 30, 2022

CPerezz commented Nov 30, 2022

CPerezz commented Nov 30, 2022 • edited Loading

Brechtpd commented Nov 30, 2022

CPerezz commented Nov 30, 2022

CPerezz commented Nov 30, 2022

CPerezz commented Nov 30, 2022

Brechtpd commented Nov 29, 2022 •

edited

Loading

CPerezz commented Nov 30, 2022 •

edited

Loading