Skip to content

Latest commit

 

History

History
235 lines (186 loc) · 7.13 KB

zk_accel_layer.md

File metadata and controls

235 lines (186 loc) · 7.13 KB

ZK Accel layer

Last update: 2023-12-04

This document is aimed at software and hardware proof system accelerators providers.

It documents the steps taken in Constantine to support the candidate ZAL (Zk Accel API) proposed in privacy-scaling-explorations/halo2#216 "[RFC] Blackboxing MSM and FFT - Hardware Accel API".

📝 Note

Constantine is written in the Nim programming language.
Nim compiles to machine code through C hence code produced by Nim has the C ABI of the compiler/OS used.

In short this guide is applicable for C or C++ libraries.

Testing

Building Constantine with ZAL supports requires:

  • Clang
  • LLVM
  • Nim
  • Rust

All of those are available through your package manager on MacOS or Linux.

Commands for testing:

git clone https://github.com/mratsim/constantine
cd constantine
cargo test
cargo bench

API

As of 2023-12-04, the candidate API is proposed in https://github.com/privacy-scaling-explorations/halo2curves/pull/107/files#diff-e746a6a49bd01b5c3241e440d309f5a5e38e583aa9a2eaa2c97419fdc1a3104aR42

pub trait ZalEngine {}

pub trait MsmAccel<C: CurveAffine>: ZalEngine {
    fn msm(&self, coeffs: &[C::Scalar], base: &[C]) -> C::Curve;
}

A similar trait will likely be proposed for FFTs and coset FFTs

Initialization of the ZalEngine is at the moment not enforced. It is recommended that ZAL backends provide:

  • an initialization function:
    • either "fn new() -> ZalEngine" for simple libraries
    • or a builder pattern for complex initializations
  • a shutdown function wrapped in a Drop trait.

The ZalEngine can be a stub type and the shutdown function might be unnecessary if the ZalEngine uses a global threadpool like Rayon.

Backends might want to add as an option:

  • The number of threads (CPU)
  • The device(s) to run on (multi-sockets machines, multi-GPUs machines, ...)
  • The curve (JIT-compiled backend)
  • ...

C Backend

Wrapping the C code can be autogenerated from Rust bindgen

cargo install bindgen-cli

Constantine uses the following scripts/gen_rust_bindings.sh

bindgen \
  include/constantine.h \
  -o constantine-rust/constantine-sys/src/bindings.rs \
  --default-enum-style rust \
  --use-core \
  --no-derive-debug \
  --default-visibility private \
  --enable-function-attribute-detection \
  -- -Iinclude

The headers for MSM are at include/curves/bn254_snarks_parallel.h

void ctt_bn254_snarks_g1_prj_multi_scalar_mul_fr_coefs_vartime_parallel(const ctt_threadpool* tp, bn254_snarks_g1_prj* r, const bn254_snarks_fr coefs[], const bn254_snarks_g1_aff points[], size_t len);

The Rust wrapping is in constantine-rust/constantine-zal-halo2kzg/src/lib.rs

use ::core::mem::MaybeUninit;
use constantine_sys::*;
use halo2curves::bn256;
use halo2curves::zal::{MsmAccel, ZalEngine};
use std::mem;

pub struct CttEngine {
    ctx: *mut ctt_threadpool,
}

impl CttEngine {
    #[inline(always)]
    pub fn new(num_threads: usize) -> CttEngine {
        let ctx = unsafe { ctt_threadpool_new(num_threads) };
        CttEngine { ctx }
    }
}

impl Drop for CttEngine {
    fn drop(&mut self) {
        unsafe { ctt_threadpool_shutdown(self.ctx) }
    }
}

impl ZalEngine for CttEngine {}

impl MsmAccel<bn256::G1Affine> for CttEngine {
    fn msm(&self, coeffs: &[bn256::Fr], bases: &[bn256::G1Affine]) -> bn256::G1 {
        assert_eq!(coeffs.len(), bases.len());
        let mut result: MaybeUninit<bn254_snarks_g1_prj> = MaybeUninit::uninit();
        unsafe {
            ctt_bn254_snarks_g1_prj_multi_scalar_mul_fr_coefs_vartime_parallel(
                self.ctx,
                result.as_mut_ptr(),
                coeffs.as_ptr() as *const bn254_snarks_fr,
                bases.as_ptr() as *const bn254_snarks_g1_aff,
                bases.len(),
            );
            mem::transmute::<MaybeUninit<bn254_snarks_g1_prj>, bn256::G1>(result)
        }
    }
}

And testing

#[cfg(test)]
mod tests {
    use super::*;

    use ark_std::{end_timer, start_timer};
    use rand_core::OsRng;

    use halo2curves::bn256;
    use halo2curves::ff::Field;
    use halo2curves::group::prime::PrimeCurveAffine;
    use halo2curves::group::{Curve, Group};
    use halo2curves::msm::msm_best;
    use halo2curves::zal::MsmAccel;

    #[test]
    fn t_threadpool() {
        let tp = CttEngine::new(4);
        drop(tp);
    }

    fn run_msm_zal(min_k: usize, max_k: usize) {
        let points = (0..1 << max_k)
            .map(|_| bn256::G1::random(OsRng))
            .collect::<Vec<_>>();
        let mut affine_points = vec![bn256::G1Affine::identity(); 1 << max_k];
        bn256::G1::batch_normalize(&points[..], &mut affine_points[..]);
        let points = affine_points;

        let scalars = (0..1 << max_k)
            .map(|_| bn256::Fr::random(OsRng))
            .collect::<Vec<_>>();

        for k in min_k..=max_k {
            let points = &points[..1 << k];
            let scalars = &scalars[..1 << k];

            let t0 = start_timer!(|| format!("freestanding msm k={}", k));
            let e0 = msm_best(scalars, points);
            end_timer!(t0);

            let engine = CttEngine::new(num_cpus::get());
            let t1 = start_timer!(|| format!("CttEngine msm k={}", k));
            let e1 = engine.msm(scalars, points);
            end_timer!(t1);

            assert_eq!(e0, e1);
        }
    }

    #[test]
    fn t_msm_zal() {
        run_msm_zal(3, 14);
    }
}

ABI description

  • ZalEngine is an opaque pointer that you can use for anything (or nothing for the builtin H2cEngine)
  • coeffs/scalars are:
    • field elements in the range [0,r) in Montgomery domain, with r the prime order of the curve. For A in Montgomery and a in canonical domain we have A = aR (mod r), R being 2²⁵⁶ for BN254
    • halo2curves uses 64-bit words, word-endianness is machine-endian (i.e. little endian on x86, ARM, RISC-V)
    • Big numbers are split into 64-bit limbs, least significant limb first, i.e. limb-endianness is little endian
    • each takes 32 bytes
  • points/bases are:
    • group elements / short Weierstrass elliptic curve points, in affine coordinates representation (NOT jacobian) Coordinates are ordered (x, y)
    • each takes 64 bytes
  • result is:
    • a group element / short Weierstrass elliptic curve points, in projective coordinates representation (NOT jacobian)

      Converting from Jacobian to Projective is explained here constantine/math/ec_shortweierstrass.nim

      For affine coordinates (x, y):

      • Projective coordinates have the form (X, Y, Z) with x = X/Z and y = Y/Z
      • Jacobian coordinates have the form (X, Y, Z) with X = X/Z² and y = Y/Z³
      func projectiveFromJacobian*[F; G](
            prj: var ECP_ShortW_Prj[F, G],
            jac: ECP_ShortW_Jac[F, G]) {.inline.} =
        prj.x.prod(jac.x, jac.z)
        prj.y = jac.y
        prj.z.square(jac.z)
        prj.z *= jac.z

      Coordinates are ordered (X, Y, Z)

    • Result requires 96 bytes