feat(ecc): VectorField Fq Mont-mul + K=5 MSM batch_affine_add by notnotraju · Pull Request #23353 · AztecProtocol/aztec-packages

notnotraju · 2026-05-17T22:15:04Z

Stacked on top of #23210 (rk/wasm-simd-03-accumulator).

Two commits:

VectorField Fq Mont-mul specialization — extracts the Mont-mul body into vector_field_mont_mul_body.inl.hpp and adds an explicit specialization for Bn254FqParams alongside the existing Bn254FrParams one. Each specialization remains a separate TU function (preserves register scope, V8 reproduces the gist's hand-scheduled WAT). 9 new VectorFieldFqTest cases mirror the Fr coverage.
K=5 q1s1 path in batch_affine_add_interleaved — uses the new Fq specialization to run 5 independent batch-inversion chains in parallel through MSM's affine-add inner loop. Per group of 5 pairs (10 points), 30 scalar muls collapse to 6 width-5 vec muls (+ 12 amortized split-tree muls). Asymptotic ~5× kernel speedup on the mul work.

Dispatch: __wasm_simd128__ && Fq == bb::fq && num_points >= 20. Below threshold, on native, or on non-BN254 curves: falls through to the original K=1 path unchanged.

Includes snapshot-before-write logic: output slot for one lane can alias the input slot of a later lane in the same group (typical for large MSM bucket sums); buffering all 5 lanes' reads before any writes prevents y3 corruption.

Why this exists

The V8 chonk breakdown shows MSM evaluate_work_units is ~50% of WASM proving time. batch_affine_add_interleaved is its workhorse. Artem's PR #23004 hits the same surface at width-2 via paired-fp51 Mont-mul; per the Slack microbench discussion, the q1s1 (5-wide) kernel wins per-mul by ~50% over fp51 at width ≥ 4. This PR is the first consumer of that width advantage in MSM. Cross-engine deterministic (integer SIMD, not relaxed-SIMD) — no Edge 147 / Safari class of bugs.

End-to-end measurement to follow (microbench + chonk under V8/Node + BrowserStack matrix). Marking draft.

Tests

Native ecc_tests: 865/865 PASS (K=5 dormant; K=1 fallback intact)
WASM ecc_tests under wasmtime: 865/905 PASS, 40 SKIPPED (intentional), 0 FAILED — K=5 actively exercised

Stack

rk/wasm-simd-01-vector-field → rk/wasm-simd-02-vectorized-for → rk/wasm-simd-03-accumulator → this PR

Lifts the operator* WASM kernel body into vector_field_mont_mul_body.inl.hpp and stamps it for both Bn254FrParams and Bn254FqParams. The macros (BB_VF_LOAD_LIMBS, BB_VF_KARATSUBA_STAGES_1_4, BB_VF_RUN_STAGES_6_THROUGH_10) already reference unqualified R_INV_WASM / P_WASM / R_INV_MOD_2_29 — those resolve in each specialization's enclosing class scope to the appropriate Params constants, so the same body produces a correctly-bound kernel per Params. Each specialization remains explicit (rather than templating the body) so LLVM emits each as a standalone TU function, preserving the register-scope that lets V8 reproduce the gist's hand-scheduled WAT layout. New VectorFieldFqTest suite (9 tests) mirrors the Fr coverage for the operations exercised by curve arithmetic: ctor, add, sub, mul (150 random trials), eq, is_zero, distributivity, mul-by-one, type alias. Verified native ecc_tests 35/35 and wasm ecc_tests under wasmtime 35/35 PASS. Prereq for MSM-side q1s1 integration in subsequent PRs.

Width-5 fast path for batch_affine_add_interleaved, using the VectorField<Bn254FqParams> Mont-mul from the prior commit. Runs 5 independent batch-inversion chains in parallel, collapses each pass's N scalar muls into N/5 width-5 vec muls (asymptotic ~5×). Dispatch: __wasm_simd128__ && Fq == bb::fq && num_points >= 20. Below threshold or on native, falls through to the original K=1 path unchanged. Snapshot-before-write per group: output slot for one lane can alias the input slot of a later lane in the same group; buffering all 5 lanes' reads before any writes prevents y3 corruption at large N. Tests: ecc_tests 37/37 PASS native + wasmtime (K=5 exercised under wasmtime).

notnotraju added 2 commits May 13, 2026 02:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ecc): VectorField Fq Mont-mul + K=5 MSM batch_affine_add#23353

feat(ecc): VectorField Fq Mont-mul + K=5 MSM batch_affine_add#23353
notnotraju wants to merge 2 commits into
rk/wasm-simd-03-accumulatorfrom
rk/wasm-simd-04-fq-mont-mul

notnotraju commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

notnotraju commented May 17, 2026

Why this exists

Tests

Stack

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant