Last update: 2023-12-04
This document is aimed at software and hardware proof system accelerators providers.
It documents the steps taken in Constantine to support the candidate ZAL (Zk Accel API) proposed in privacy-scaling-explorations/halo2#216 "[RFC] Blackboxing MSM and FFT - Hardware Accel API".
📝
NoteConstantine is written in the Nim programming language.
Nim compiles to machine code through C hence code produced by Nim has the C ABI of the compiler/OS used.In short this guide is applicable for C or C++ libraries.
Building Constantine with ZAL supports requires:
- Clang
- LLVM
- Nim
- Rust
All of those are available through your package manager on MacOS or Linux.
Commands for testing:
git clone https://github.com/mratsim/constantine
cd constantine
cargo test
cargo bench
As of 2023-12-04, the candidate API is proposed in https://github.com/privacy-scaling-explorations/halo2curves/pull/107/files#diff-e746a6a49bd01b5c3241e440d309f5a5e38e583aa9a2eaa2c97419fdc1a3104aR42
pub trait ZalEngine {}
pub trait MsmAccel<C: CurveAffine>: ZalEngine {
fn msm(&self, coeffs: &[C::Scalar], base: &[C]) -> C::Curve;
}
A similar trait will likely be proposed for FFTs and coset FFTs
Initialization of the ZalEngine is at the moment not enforced. It is recommended that ZAL backends provide:
- an initialization function:
- either "fn new() -> ZalEngine" for simple libraries
- or a builder pattern for complex initializations
- a shutdown function wrapped in a Drop trait.
The ZalEngine can be a stub type and the shutdown function might be unnecessary if the ZalEngine uses a global threadpool like Rayon.
Backends might want to add as an option:
- The number of threads (CPU)
- The device(s) to run on (multi-sockets machines, multi-GPUs machines, ...)
- The curve (JIT-compiled backend)
- ...
Wrapping the C code can be autogenerated from Rust bindgen
cargo install bindgen-cli
Constantine uses the following scripts/gen_rust_bindings.sh
bindgen \
include/constantine.h \
-o constantine-rust/constantine-sys/src/bindings.rs \
--default-enum-style rust \
--use-core \
--no-derive-debug \
--default-visibility private \
--enable-function-attribute-detection \
-- -Iinclude
The headers for MSM are at include/curves/bn254_snarks_parallel.h
void ctt_bn254_snarks_g1_prj_multi_scalar_mul_fr_coefs_vartime_parallel(const ctt_threadpool* tp, bn254_snarks_g1_prj* r, const bn254_snarks_fr coefs[], const bn254_snarks_g1_aff points[], size_t len);
The Rust wrapping is in constantine-rust/constantine-zal-halo2kzg/src/lib.rs
use ::core::mem::MaybeUninit;
use constantine_sys::*;
use halo2curves::bn256;
use halo2curves::zal::{MsmAccel, ZalEngine};
use std::mem;
pub struct CttEngine {
ctx: *mut ctt_threadpool,
}
impl CttEngine {
#[inline(always)]
pub fn new(num_threads: usize) -> CttEngine {
let ctx = unsafe { ctt_threadpool_new(num_threads) };
CttEngine { ctx }
}
}
impl Drop for CttEngine {
fn drop(&mut self) {
unsafe { ctt_threadpool_shutdown(self.ctx) }
}
}
impl ZalEngine for CttEngine {}
impl MsmAccel<bn256::G1Affine> for CttEngine {
fn msm(&self, coeffs: &[bn256::Fr], bases: &[bn256::G1Affine]) -> bn256::G1 {
assert_eq!(coeffs.len(), bases.len());
let mut result: MaybeUninit<bn254_snarks_g1_prj> = MaybeUninit::uninit();
unsafe {
ctt_bn254_snarks_g1_prj_multi_scalar_mul_fr_coefs_vartime_parallel(
self.ctx,
result.as_mut_ptr(),
coeffs.as_ptr() as *const bn254_snarks_fr,
bases.as_ptr() as *const bn254_snarks_g1_aff,
bases.len(),
);
mem::transmute::<MaybeUninit<bn254_snarks_g1_prj>, bn256::G1>(result)
}
}
}
And testing
#[cfg(test)]
mod tests {
use super::*;
use ark_std::{end_timer, start_timer};
use rand_core::OsRng;
use halo2curves::bn256;
use halo2curves::ff::Field;
use halo2curves::group::prime::PrimeCurveAffine;
use halo2curves::group::{Curve, Group};
use halo2curves::msm::msm_best;
use halo2curves::zal::MsmAccel;
#[test]
fn t_threadpool() {
let tp = CttEngine::new(4);
drop(tp);
}
fn run_msm_zal(min_k: usize, max_k: usize) {
let points = (0..1 << max_k)
.map(|_| bn256::G1::random(OsRng))
.collect::<Vec<_>>();
let mut affine_points = vec![bn256::G1Affine::identity(); 1 << max_k];
bn256::G1::batch_normalize(&points[..], &mut affine_points[..]);
let points = affine_points;
let scalars = (0..1 << max_k)
.map(|_| bn256::Fr::random(OsRng))
.collect::<Vec<_>>();
for k in min_k..=max_k {
let points = &points[..1 << k];
let scalars = &scalars[..1 << k];
let t0 = start_timer!(|| format!("freestanding msm k={}", k));
let e0 = msm_best(scalars, points);
end_timer!(t0);
let engine = CttEngine::new(num_cpus::get());
let t1 = start_timer!(|| format!("CttEngine msm k={}", k));
let e1 = engine.msm(scalars, points);
end_timer!(t1);
assert_eq!(e0, e1);
}
}
#[test]
fn t_msm_zal() {
run_msm_zal(3, 14);
}
}
- ZalEngine is an opaque pointer that you can use for anything (or nothing for the builtin H2cEngine)
- coeffs/scalars are:
- field elements in the range [0,r) in Montgomery domain, with
r
the prime order of the curve. For A in Montgomery and a in canonical domain we haveA = aR (mod r)
, R being 2²⁵⁶ for BN254 - halo2curves uses 64-bit words, word-endianness is machine-endian (i.e. little endian on x86, ARM, RISC-V)
- Big numbers are split into 64-bit limbs, least significant limb first, i.e. limb-endianness is little endian
- each takes 32 bytes
- field elements in the range [0,r) in Montgomery domain, with
- points/bases are:
- group elements / short Weierstrass elliptic curve points, in affine coordinates representation (NOT jacobian) Coordinates are ordered (x, y)
- each takes 64 bytes
- result is:
-
a group element / short Weierstrass elliptic curve points, in projective coordinates representation (NOT jacobian)
Converting from Jacobian to Projective is explained here constantine/math/ec_shortweierstrass.nim
For affine coordinates (x, y):
- Projective coordinates have the form (X, Y, Z) with x = X/Z and y = Y/Z
- Jacobian coordinates have the form (X, Y, Z) with X = X/Z² and y = Y/Z³
func projectiveFromJacobian*[F; G]( prj: var ECP_ShortW_Prj[F, G], jac: ECP_ShortW_Jac[F, G]) {.inline.} = prj.x.prod(jac.x, jac.z) prj.y = jac.y prj.z.square(jac.z) prj.z *= jac.z
Coordinates are ordered (X, Y, Z)
-
Result requires 96 bytes
-