feat(simd): Phase 1 — explicit cargo configs + AVX2 dispatch hardening#172
Conversation
Implements Phase 1 of the integration plan in `.claude/knowledge/ simd-dispatch-architecture.md` (PR #171). Changes ------- 1. `.cargo/config.toml` — set `target-cpu = "x86-64-v3"` for x86_64. Previously the file declared "no global target-cpu", which compiled binaries to x86-64 generic (SSE2). `simd_avx2::F32x16` and friends wrap `__m256` / `__m256i` intrinsics that the runtime CPU never executes under SSE2, producing the PR #170 SIGILL CI mode (38 tests timing out uniformly at ~19s in `simd_avx2::*` / `simd_ops::*` / `simd_soa::*`). 2. `.cargo/config-avx512.toml` (new) — explicit `x86-64-v4` for AVX-512 builds. Triggered by `cargo --config .cargo/config-avx512.toml`. 3. `.cargo/config-native.toml` (new) — `target-cpu = "native"` for build-host-tuned binaries (developer machines). Non-portable. 4. `src/simd.rs` — tighten the AVX2 dispatch arm predicate from `not(target_feature = "avx512f")` to `target_feature = "avx2" + not(target_feature = "avx512f")`. Belts-and-braces: under v3 the predicates are equivalent, but the explicit `avx2` requirement means a future "build me without v3" invocation lands on a compile error rather than a SIGILL at run time. Stale "target-cpu=x86-64-v4 → AVX-512" comment refreshed to describe the new three-config dispatch model. Out of scope for this PR ------------------------ The architecture doc (PR #171) claimed Phase 1 also needed to "add ~10 missing AVX2 two-half wrappers". On survey those wrappers already exist in `src/simd_avx2.rs`: - `F32x16` / `F64x8` — true two-half AVX wrappers - `U8x32` — native AVX2 `__m256i` - `U8x64` / `I8x64` / `I16x32` / `I32x16` / `I64x8` / `U16x32` / `U32x16` / `U64x8` — scalar polyfill via the `avx2_int_type!` macro (storage = `[$elem; $lanes]` align 64). The matrix in the architecture doc will be corrected as a follow-up. The parity gap that does exist (scalar-polyfill ints are not vectorized under AVX2) is its own piece of tech debt, tracked separately.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0d0031812d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub use crate::simd_avx512::{f32x8, f64x4, i16x16, i8x32, F32x8, F64x4, I16x16, I8x32}; | ||
|
|
||
| #[cfg(all(target_arch = "x86_64", not(target_feature = "avx512f")))] | ||
| #[cfg(all(target_arch = "x86_64", target_feature = "avx2", not(target_feature = "avx512f")))] |
There was a problem hiding this comment.
Restore x86_64 fallback when AVX2 is unavailable
Requiring target_feature = "avx2" on the x86_64 re-export arm removes all F32x16/F64x8/integer SIMD type exports for x86_64 builds that are not compiled with AVX2 (for example downstream users building this crate with default x86_64 flags or x86-64-v2). Because this file defines unconditional APIs like simd_exp_f32(x: F32x16), those builds now fail at compile time due to missing type definitions instead of falling back; .cargo/config.toml in this repo does not protect dependency builds in other workspaces.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in e3ad707 (already on the merged branch) — reverted the target_feature = "avx2" predicate tightening for exactly this reason. Same root cause surfaced in our CI: RUSTFLAGS="-D warnings" env in ci.yaml overrides .cargo/config.toml rustflags entirely (cargo doesn't merge — env wins), so even our own GitHub runner landed on x86-64 baseline without target_feature = "avx2" set, leaving no matching arm → consumer references to crate::simd::F32x16 failed to compile.
Predicate is back to not(avx512f). Per-function #[target_feature(enable = "avx,avx2,fma")] annotations inside simd_avx2.rs gate the actual intrinsic execution at the symbol level; the struct-field types (__m256 / __m256i) are core::arch declarations that don't require AVX/AVX2 at the type level. Downstream consumers building this crate with default x86_64 flags or x86-64-v2 now keep their type exports.
Generated by Claude Code
… v3 config The previous commit tightened the x86_64 dispatch arm to `target_feature = "avx2" + not(avx512f)`. The intent was to make "x86-64 baseline + AVX2 wrappers" a compile error rather than a SIGILL. CI green-mode disagreed: `.github/workflows/ci.yaml` sets a global `RUSTFLAGS="-D warnings"` env that overrides the rustflags from `.cargo/config.toml` entirely (cargo doesn't merge env + config rustflags — env wins). So in CI the v3 baseline never takes effect, x86-64 generic / SSE2 is what builds, `target_feature = "avx2"` is not set, and the tightened arm leaves no matching dispatch path → consumer references to `crate::simd::F32x16` fail to compile. The pre-existing wider `not(avx512f)` predicate works at x86-64 baseline because the inner intrinsics in `simd_avx2.rs` use per-function `#[target_feature(enable = "avx,avx2,fma")]` annotations — the OPS gate themselves at the symbol level, struct fields like `__m256` / `__m256i` are core::arch type declarations that don't require AVX/AVX2 at the type level (only at execution). Reverting the predicate. The cargo configs added in the previous commit stay — they're the documented opt-in affordances. Local `cargo build` without env override gets v3; CI runs at baseline + per-function target_feature; explicit AVX-512 via `--config .cargo/config-avx512.toml`.
Summary
Phase 1 of the integration plan in
.claude/knowledge/simd-dispatch-architecture.md(PR #171)..cargo/config.toml— pintarget-cpu = "x86-64-v3"for x86_64 (was empty → SSE2 baseline →__m256/__m256iintrinsics insidesimd_avx2::F32x16/F64x8/ int wrappers ran under SSE2 → SIGILL on the GH runner, PR PR-X12 A1: CTU carrier + quad-tree partition #170's 38-test uniform-timeout failure mode)..cargo/config-avx512.toml(new) — explicitx86-64-v4for AVX-512 builds..cargo/config-native.toml(new) —target-cpu = "native"for developer-machine builds.src/simd.rs— tighten the AVX2 dispatch arm predicate fromnot(avx512f)toavx2 + not(avx512f). Refresh the stale "target-cpu=x86-64-v4 → AVX-512" comment block to describe the new three-config dispatch model.Scope correction vs PR #171
The architecture doc listed "add ~10 missing AVX2 two-half wrappers" as Phase 1 work. On survey, those wrappers already exist in
src/simd_avx2.rs:F32x16/F64x8— true two-half AVX wrappers ((f32x8, f32x8)/(f64x4, f64x4))U8x32— native AVX2__m256iU8x64/I8x64/I16x32/I32x16/I64x8/U16x32/U32x16/U64x8— scalar polyfills via theavx2_int_type!macro (storage =[$elem; $lanes], align 64)The matrix entry in #171's parity table will be corrected on the doc branch as a follow-up. The remaining gap (AVX2 int wrappers are scalar-polyfill, not vectorized) is its own piece of tech debt and is not in Phase 1's scope.
Test plan
cargo build— default config → builds withx86-64-v3(AVX2)cargo --config .cargo/config-avx512.toml build— AVX-512 build pathcargo --config .cargo/config-native.toml build— native build pathsimd_avx2::*/simd_ops::*/simd_soa::*no longer SIGILL on the AVX2-only GitHub runnerGenerated by Claude Code