Skip to content
This repository has been archived by the owner on Jul 5, 2024. It is now read-only.

Keccak implementations discarting benches #948

Merged
merged 7 commits into from
Dec 1, 2022
Merged

Conversation

CPerezz
Copy link
Member

@CPerezz CPerezz commented Nov 29, 2022

In this PR we will run the three actual implementations of keccak with the ChallengeAPI already placed in.

With the results (which will be the SuperCircuit benchmarks where keccak is needed) we will chose only 1 implementation to keep and will remove the rest.

This should resolve: #941 and #943 as also the label to trigger these benchmarks was added.

@github-actions github-actions bot added crate-circuit-benchmarks Issues related to the circuit-benchmarks workspace member crate-zkevm-circuits Issues related to the zkevm-circuits workspace member T-bench Type: benchmark improvements labels Nov 29, 2022
@CPerezz CPerezz added the benchmarks: Super Triggers a Super Circuit benchmark label Nov 29, 2022
@Brechtpd
Copy link
Collaborator

Brechtpd commented Nov 29, 2022

  • I think ideally the benching the combined super circuit + root (/aggregation circuit) makes sense to find the one that will be the best in practice, but I guess this may be a bit hard to do right now.
  • Currently we have to be a bit careful because the KECCAK_DEGREE env variable is now used to decide how big the lookup tables will be that are used by the circuits (which will have a big impact on the performance of the circuit because that will decide how many lookups/cell are required in the circuit). These lookup tables grow dynamically based on the height of the circuit (set by DEGREE) but this variable was renamed so the height of the lookup tables don't automatically follow the height of the circuit anymore (not sure why that was done). We need this height while building the circuit, unfortunately the halo2 API doesn't make it easy to get this circuit height there but now that I look at it again I think it shouldn't be too hard to pass it in as a parameter which would be much better. (EDIT: eh now I remember the problem, configure in the Circuit doesn't take self as an argument so no way to pass in a variable like that. Seems like a halo2 API change would be needed to do this unless I miss some other way to do this).

@CPerezz
Copy link
Member Author

CPerezz commented Nov 29, 2022

I think ideally the benching the combined super circuit + root (/aggregation circuit)

I do agree with that. But this type of benchmark just compiled and was run for the first time today thanks to some changes to the zkevm-chain params. So I think for selection purposes, with the SuperCircuit should be enough.

These lookup tables grow dynamically based on the height of the circuit (set by DEGREE) but this variable was renamed so the height of the lookup tables don't automatically follow the height of the circuit anymore (not sure why that was done).

Should we file an issue for that and work on it? If you want, you can take it directly @Brechtpd !

(EDIT: eh now I remember the problem, configure in the Circuit doesn't take self as an argument so no way to pass in a variable like that. Seems like a halo2 API change would be needed to do this unless I miss some other way to do this).

Well, here you can do various things. You can try to fetch an ENV var and if it doesn't give any value, default to something. Would that work?

JIC, I'll run the benches with the three impls since this might bring some light anywhay.

@CPerezz
Copy link
Member Author

CPerezz commented Nov 29, 2022

Results for Multi Packed

Start:   SuperCircuit Proof generation with degree = 19
End:     SuperCircuit Proof generation with degree = 19 ............................2902.707s
Start:   SuperCircuit Proof verification
End:     SuperCircuit Proof verification ...........................................289.750ms
test super_circuit::tests::bench_super_circuit_prover ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 8 filtered out; finished in 3028.61s

Maximum CPU Usage at 100.0%
Maximum Mem Usage at 251.7Gb

@CPerezz CPerezz added benchmarks: Super Triggers a Super Circuit benchmark and removed benchmarks: Super Triggers a Super Circuit benchmark labels Nov 29, 2022
@Brechtpd
Copy link
Collaborator

Should we file an issue for that and work on it? If you want, you can take it directly @Brechtpd !

I don't know the reason for the change so would like to know that first before changing it back.

Well, here you can do various things. You can try to fetch an ENV var and if it doesn't give any value, default to something. Would that work?

Yeah that is what is done now using the KECCAK_DEGREE (which was previously also just DEGREE to automatically map to the same height) and it defaults to a small value (the lookup tables cannot be bigger than the circuit). I believe an env variable is not ideal because the circuit height in the code could of course bet set to anything (not just a value from an env variable). And the best height of the lookup tables is exactly the circuit height itself so should normally always be exactly the same. I think ideally the halo2 Circuit trait is modified so that configure takes a self parameter (just like synthesize does) so that it's possible for a circuit implementation to pass in additional data to the circuit building code. Halo2 kind of assumes that the constraints are independent of the circuit height, but with lookups that is not really true anymore (bigger lookup tables possible -> less lookups need to be generated).

@CPerezz
Copy link
Member Author

CPerezz commented Nov 29, 2022

I don't know the reason for the change so would like to know that first before changing it back.
I'd check with @ed255 or @ChihChengLiang . As I've been away for a couple months. And I'm missing some context on topics like this one.

As for self inside Circuit::configure() I belive @han0110 already discussed that with @str4d and there are some concerns about soundness issues . We could maybe file an issue in halo2 upstream and extend the discussion a bit more. But is a significant change, so I'd not expect to get it landed(in case it lands) soon.

@lispc
Copy link
Collaborator

lispc commented Nov 30, 2022

I remember even if bit keccak is farest in super circuit, it is slowest in aggregation. Each column in target circuit brings huge overhead in aggregation. Previous we switched to multi_pack not because it is fast itself But it has least columns

@lispc
Copy link
Collaborator

lispc commented Nov 30, 2022

In the pr to switch to multi pack I and Han already posted many realistic and theory results talking sub circuit and aggregation circuit

@han0110
Copy link
Collaborator

han0110 commented Nov 30, 2022

As for self inside Circuit::configure() I belive @ han0110 already discussed that with @ str4d and there are some concerns about soundness issues . We could maybe file an issue in halo2 upstream and extend the discussion a bit more. But is a significant change, so I'd not expect to get it landed(in case it lands) soon.

@CPerezz To clarify a little bit I was not in the discussion actually (that happened in Halo 2 Ecosystem discord), I only know it's said we need to be careful about the API change to avoid enable soundness breakage (which makes sense), not directly leading to soundness issue, so perhaps we might also experiment it before upstream if it's really painful to pass many const generics.

@CPerezz
Copy link
Member Author

CPerezz commented Nov 30, 2022

Apologies @han0110 as I missunderstood you.

Do you think is worth running the benches on the aggregation circuit with the keccak impls? Or shall we just use multi-packed directly? @Brechtpd @han0110 @lispc

I know it's configurable as @Brechtpd told me. So I'm sure we can "shape" it to be longer (more rows) and less wide.
In any case, we know for sure AFAIS that Bit and Packed approches aren't good enough. So we might as well just remove them and stick to Multi-packed for now.

Does this work for you all?

@CPerezz CPerezz added benchmarks: Super Triggers a Super Circuit benchmark and removed benchmarks: Super Triggers a Super Circuit benchmark labels Nov 30, 2022
@github-actions github-actions bot removed crate-circuit-benchmarks Issues related to the circuit-benchmarks workspace member T-bench Type: benchmark improvements labels Nov 30, 2022
@CPerezz CPerezz added benchmarks: Super Triggers a Super Circuit benchmark and removed benchmarks: Super Triggers a Super Circuit benchmark labels Nov 30, 2022
@CPerezz
Copy link
Member Author

CPerezz commented Nov 30, 2022

Results for Bit Approach

Start:   SuperCircuit Proof generation with degree = 19
End:     SuperCircuit Proof generation with degree = 19 ............................2439.696s
Start:   SuperCircuit Proof verification
End:     SuperCircuit Proof verification ...........................................891.473ms
test super_circuit::tests::bench_super_circuit_prover ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 8 filtered out; finished in 2520.29s

Maximum CPU Usage at 100.0%
Maximum Mem Usage at 383.9Gb

Much bigger memory consuption while not giving any better performance. It's aprox 500s faster than multi-packed. Which means ~8-9 minutes more.

I think considering the difference with the main time taken to execute all, we should definitely just take keccak_multi_packed and remove the rest. As they don't seem to provide significant advantages.

@CPerezz CPerezz added benchmarks: Super Triggers a Super Circuit benchmark and removed benchmarks: Super Triggers a Super Circuit benchmark labels Nov 30, 2022
@Brechtpd
Copy link
Collaborator

These were some results of the benches I ran on aws (c5.18xlarge). One extra caveat was that I used modified FFT implementation that is up to 3x faster than the one currently in halo2.

  1 core 2 cores 4 cores 8 cores 16 cores
SHA-256 (2^18) 15 hashes/s 26 hashes/s 45 hashes/s 75 hashes/s 116 hashes/s
KECCAK-256 (bit) (2^13) 2.1 hashes/s 3.4 hashes/s 5.7 hashes/s 8.7 hashes/s 11.5 hashes/s
KECCAK-256 (multi-packed) (2^16) 1.1 hashes/s 1.7 hashes/s 2.7 hashes/s 4.3 hashes/s 6.4 hashes/s

Another caveat is that the multi-packed was only run on a small circuit (2^16), and a higher circuit would also be significantly faster because of the reduced number of lookups required so the difference in practical scenario's will no doubt be smaller.

Then of course the other concern is performance of the aggregation circuit. For now multi-packed is the only one that provides enough flexibility to optimize the number of columns. However it is possible to modify the bit implementation to do something similar, it's just that this was harder than I thought with how some things are currently done so I didn't pursue it further. I think the biggest advantage of the bit implementation is that it's significantly simpler compared to the packed lookup-based ones, and could in theory could be a great implementation under something like HyperPlonk.

But I totally agree with how things are now multi-packed is the way to go.

@CPerezz
Copy link
Member Author

CPerezz commented Nov 30, 2022

Results for Packed Approach

running 1 test
Start:   Setup generation with degree = 19
End:     Setup generation with degree = 19 .........................................3.069s
test super_circuit::tests::bench_super_circuit_prover has been running for over 60 seconds
Start:   SuperCircuit Proof generation with degree = 19
Maximum CPU Usage at 81.9%
Maximum Mem Usage at 966.2Gb

This makes clear together with the comments that we should mainly just use multi_packed.
Therefore, I'm removing the other implementations.

@github-actions github-actions bot added the crate-circuit-benchmarks Issues related to the circuit-benchmarks workspace member label Nov 30, 2022
@github-actions github-actions bot added the T-bench Type: benchmark improvements label Nov 30, 2022
@CPerezz
Copy link
Member Author

CPerezz commented Nov 30, 2022

I think based on the results and also, the input from @han0110 , @Brechtpd and @lispc we can all agree that the multi_packed keccak impl is the one we definitely want to keep.

Therefore, I've removed the two others, and I plan to address now with much more priority the work on #933

@CPerezz
Copy link
Member Author

CPerezz commented Nov 30, 2022

@Brechtpd could you also review this? You've been the biggest contributor in this area. So would be nice to count with your approval to do this!! I can't assign you directly.

@CPerezz CPerezz merged commit 7cd18d1 into main Dec 1, 2022
@CPerezz CPerezz deleted the keccak-discard-benches branch December 1, 2022 08:17
jonathanpwang pushed a commit to jonathanpwang/zkevm-circuits that referenced this pull request Dec 29, 2022
…ns#948)

* feat: Integrate KeccakPackedMulti with Challenge-API

* fix: Clippy lints

* fix: Pay privacy-scaling-explorations#925 tech debt

* remove: Keccak packed implementation

* remove: Keccak-bit implementation

* change: Export KeccakPackedConfig as KeccakConfig

* remove: Benchmarks of discarted keccak impls
noel2004 pushed a commit to noel2004/zkevm-circuits that referenced this pull request Nov 29, 2023
…ns#948)

* add ec pairing invalid len > 65536 test

* refine test

* refine

* fix: call_data_len can be 4 bytes long
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
benchmarks: Super Triggers a Super Circuit benchmark crate-circuit-benchmarks Issues related to the circuit-benchmarks workspace member crate-zkevm-circuits Issues related to the zkevm-circuits workspace member T-bench Type: benchmark improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmark and decide which keccak implementation variant we keep
4 participants