Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZAL: ZK Accel Layer #308

Merged
merged 24 commits into from
Dec 4, 2023
Merged

ZAL: ZK Accel Layer #308

merged 24 commits into from
Dec 4, 2023

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented Dec 2, 2023

This implements the tentative ZK Accel Layer for the Ethereum Foundation / Privacy Scaling Exploration prover, Halo2-KZG.

ZAL is described and discussed here: privacy-scaling-explorations/halo2#216

This uses candidate traits from https://github.com/taikoxyz/halo2curves/blob/pr-pse-exec-engine/src/zal.rs#L42-L49 (commit https://github.com/taikoxyz/halo2curves/blob/049469172c64698dcbcc0b0fc00bcedc7695a888/src/zal.rs#L42-L49)

Benchmarks

On a Ryzen Pro 7840U (2023, 8 cores, low-power mobile 15W to 30W TDP)

image
image

For hardware providers

The C header for the MSM is:

void ctt_bn254_snarks_g1_jac_multi_scalar_mul_big_coefs_vartime_parallel(const ctt_threadpool* tp, bn254_snarks_g1_jac* r, const big254 coefs[], const bn254_snarks_g1_aff points[], size_t len);
void ctt_bn254_snarks_g1_jac_multi_scalar_mul_fr_coefs_vartime_parallel(const ctt_threadpool* tp, bn254_snarks_g1_jac* r, const bn254_snarks_fr coefs[], const bn254_snarks_g1_aff points[], size_t len);
void ctt_bn254_snarks_g1_prj_multi_scalar_mul_big_coefs_vartime_parallel(const ctt_threadpool* tp, bn254_snarks_g1_prj* r, const big254 coefs[], const bn254_snarks_g1_aff points[], size_t len);
void ctt_bn254_snarks_g1_prj_multi_scalar_mul_fr_coefs_vartime_parallel(const ctt_threadpool* tp, bn254_snarks_g1_prj* r, const bn254_snarks_fr coefs[], const bn254_snarks_g1_aff points[], size_t len);

The C header for the threadpool is:

typedef struct ctt_threadpool ctt_threadpool;
/** Create a new threadpool that manages `num_threads` threads
*
* Initialize a threadpool that manages `num_threads` threads.
*
* A Constantine's threadpool cannot be instantiated
* on a thread managed by another Constantine's threadpool
* including the root thread.
*
* Mixing with other libraries' threadpools and runtime
* will not impact correctness but may impact performance.
*/
struct ctt_threadpool* ctt_threadpool_new(size_t num_threads);
/** Wait until all pending tasks are processed and then shutdown the threadpool
*/
void ctt_threadpool_shutdown(struct ctt_threadpool* threadpool);

The Rust bindgen script is:

# Due to cryptographic secrets, deriving Debug is absolutely forbidden.
# Some resources are non-copyable non-clonable:
# - Threadpools
# - Contexts holding sessions
bindgen \
include/constantine.h \
-o constantine-rust/constantine-sys/src/bindings.rs \
--default-enum-style rust \
--use-core \
--no-derive-debug \
--default-visibility private \
--enable-function-attribute-detection \
-- -Iinclude

Wrapping to provide the Engine (i.e. just the threadpool, but it might be some Cuda or FPGA context manager for other backends) is done the following way:

pub struct CttEngine {
ctx: *mut ctt_threadpool,
}
impl CttEngine {
#[inline(always)]
pub fn new(num_threads: usize) -> CttEngine {
let ctx = unsafe { ctt_threadpool_new(num_threads) };
CttEngine { ctx }
}
}
impl Drop for CttEngine {
fn drop(&mut self) {
unsafe { ctt_threadpool_shutdown(self.ctx) }
}
}
impl ZalEngine for CttEngine {}

Adding the MsmAccel trait is done the following way

impl MsmAccel<bn256::G1Affine> for CttEngine {
fn msm(&self, coeffs: &[bn256::Fr], bases: &[bn256::G1Affine]) -> bn256::G1 {
assert_eq!(coeffs.len(), bases.len());
let mut result: MaybeUninit<bn254_snarks_g1_prj> = MaybeUninit::uninit();
unsafe {
ctt_bn254_snarks_g1_prj_multi_scalar_mul_fr_coefs_vartime_parallel(
self.ctx,
result.as_mut_ptr(),
coeffs.as_ptr() as *const bn254_snarks_fr,
bases.as_ptr() as *const bn254_snarks_g1_aff,
bases.len(),
);
mem::transmute::<MaybeUninit<bn254_snarks_g1_prj>, bn256::G1>(result)
}
}
}

Note on linking

For maximum performance, cross-language LTO between Nim and Rust is used.

Nim is compiled as a static library with clang and Thin-LTO
Rust is compiled with thin-lto as well, LLD is used as a linker through:

rustflags="-Clinker-plugin-lto -Clinker=clang -Clink-arg=-fuse-ld=lld"

On MacOS, Apple Clang does not support Intel assembly syntax as it is missing an upstream LLVM commit. Installing LLVM/Clang through Homebrew fixes that.

@mratsim
Copy link
Owner Author

mratsim commented Dec 2, 2023

CI error: https://github.com/mratsim/constantine/actions/runs/7069440808/job/19245124465?pr=308#step:24:98

= note: ld.lld: error: /home/runner/work/constantine/constantine/constantine/target/debug/build/crossbeam-utils-302f4860eb614cb5/build_script_build-302f4860eb614cb5.build_script_build.2c9005dc9d044489-cgu.0.rcgu.o: Opaque pointers are only supported in -opaque-pointers mode (Producer: 'LLVM17.0.4-rust-1.74.0-stable' Reader: 'LLVM 14.0.0')
clang: error: linker command failed with exit code 1 (use -v to see invocation)

LLVM 15+ supports opaque pointers by default.
It seems like lld or Clang in CI is using LLVM 14 instead of 17 which is the one Rust was built with.

@mratsim
Copy link
Owner Author

mratsim commented Dec 4, 2023

Last remaining intermittent issue is because some of the MacOS runners do not support ADX instructions (error is SIGILL) and Rust Halo2curves does not support runtime CPU features detection.

  nim -v
  gcc -v
  clang -v
  rustup --version
  if [[ 'amd64' != 'i386' && 'macOS' != 'Windows' ]]; then
    llvm-config --version
  fi
  if [[ 'macOS' == 'Linux' ]]; then
    cat /proc/cpuinfo
  fi
  if [[ 'macOS' == 'macOS' ]]; then
    sysctl -a | grep machdep.cpu
    sysctl -a | grep hw | grep cpu
    sysctl -a | grep hw.optional
  fi
  shell: /bin/bash --noprofile --norc -e -o pipefail {0}
Nim Compiler Version 1.6.17 [MacOSX: amd64]
Compiled at 2023-11-19
Copyright (c) 2006-2023 by Andreas Rumpf

git hash: f3382743dda90506cebeb54c1c2c0e5488bbf74f
active boot switches: -d:release
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: x86_64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Applications/Xcode_14.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Homebrew clang version 15.0.7
Target: x86_64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /usr/local/opt/llvm@15/bin
info: This is the version for the rustup toolchain manager, not the rustc compiler.
rustup 1.26.0 (2023-04-05)
info: The currently active `rustc` version is `rustc 1.74.0 (79e9716c9 2023-11-13)`
15.0.7
machdep.cpu.mwait.linesize_min: 64
machdep.cpu.mwait.linesize_max: 4096
machdep.cpu.mwait.extensions: 3
machdep.cpu.mwait.sub_Cstates: 16
machdep.cpu.thermal.sensor: 0
machdep.cpu.thermal.dynamic_acceleration: 0
machdep.cpu.thermal.invariant_APIC_timer: 1
machdep.cpu.thermal.thresholds: 0
machdep.cpu.thermal.ACNT_MCNT: 0
machdep.cpu.thermal.core_power_limits: 0
machdep.cpu.thermal.fine_grain_clock_mod: 0
machdep.cpu.thermal.package_thermal_intr: 0
machdep.cpu.thermal.hardware_feedback: 0
machdep.cpu.thermal.energy_policy: 0
hw.perflevel0.cpusperl2: 1
hw.perflevel0.cpusperl3: 3
hw.physicalcpu: 3
hw.physicalcpu_max: 3
hw.logicalcpu: 3
hw.logicalcpu_max: 3
hw.cputype: 7
hw.cpusubtype: 4
hw.cpu64bit_capable: 1
hw.cpufamily: 526772277
hw.cpusubfamily: 0
hw.cpufrequency: 3337000000
hw.cpufrequency_min: 3337000000
hw.cpufrequency_max: 3337000000
hw.optional.floatingpoint: 1
hw.optional.mmx: 1
hw.optional.sse: 1
hw.optional.sse2: 1
hw.optional.sse3: 1
hw.optional.supplementalsse3: 1
hw.optional.sse4_1: 1
hw.optional.sse4_2: 1
hw.optional.x86_64: 1
hw.optional.aes: 1
hw.optional.avx1_0: 1
hw.optional.rdrand: 1
hw.optional.f16c: 1
hw.optional.enfstrg: 0
hw.optional.fma: 0
hw.optional.avx2_0: 0
hw.optional.bmi1: 0
hw.optional.bmi2: 0
hw.optional.rtm: 0
hw.optional.hle: 0
hw.optional.adx: 0
hw.optional.mpx: 0
hw.optional.sgx: 0
hw.optional.avx512f: 0
hw.optional.avx512cd: 0
hw.optional.avx512dq: 0
hw.optional.avx512bw: 0
hw.optional.avx512vl: 0
hw.optional.avx512ifma: 0
hw.optional.avx512vbmi: 0

@mratsim mratsim merged commit 78159b5 into master Dec 4, 2023
16 checks passed
@mratsim mratsim deleted the zal branch December 4, 2023 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant