From 45b1bfe8ed431f1eb81714c19d8d6b6fc620b852 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 20 May 2026 12:22:49 +0000 Subject: [PATCH 1/2] =?UTF-8?q?ci(simd):=20Phase=206=20=E2=80=94=20AVX-512?= =?UTF-8?q?=20dispatch=20check=20job?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 6 of the integration plan in `.claude/knowledge/ simd-dispatch-architecture.md`. Adds a new `tier4-avx512-check` job that compiles the crate with `-Ctarget-cpu=x86-64-v4` so the `#[cfg(target_feature = "avx512f")]` dispatch arm in `src/simd.rs` is exercised on every PR. Without this the AVX-512 code path bit-rots under the v3 default (`x86-64-v3` baseline in `.cargo/config.toml`) — it compiles only when a developer happens to build locally with `--config .cargo/config-avx512.toml`. Implementation notes -------------------- * `cargo check` instead of `cargo test`/`cargo build`. GH-hosted `ubuntu-latest` runners have intermittent AVX-512 silicon across VM SKUs (Azure D-series mix); a v4-baked binary would SIGILL on a non-AVX-512 host. `check` compiles through type/borrow/monomorphization without producing a runnable artifact — catches the dispatch-arm type mismatches that motivated this PR series in the first place (PR #170 CI failure mode). * Job-level `env: RUSTFLAGS: "-D warnings -Ctarget-cpu=x86-64-v4"` overrides the global `RUSTFLAGS="-D warnings"` set at the top of `ci.yaml`. Without the override, `.cargo/config-avx512.toml`'s rustflags would be ignored — env wins over config file in cargo's precedence (the same issue that broke PR #172 with the v3 setting in config.toml). * Two check passes: default features + `hpc-extras`. The latter pulls the p64/fractal dep tree which exercises a different slice of the AVX-512 codepaths (BF16 RNE, AMX byte-asm). Each runs ~30 s with Swatinem cache hit, ~3 min cold. * Added to the `conclusion` job's `needs` list so a v4 check failure blocks merge. --- .github/workflows/ci.yaml | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index e416599a..4c2c9a37 100644 --- a/.github/workflows/ci.yaml +++ b/.github/workflows/ci.yaml @@ -171,6 +171,38 @@ jobs: - name: clippy --features rayon run: cargo clippy -p ndarray --features rayon --lib -- -D warnings + tier4-avx512-check: + # Phase 6 of the SIMD integration plan (.claude/knowledge/ + # simd-dispatch-architecture.md). Exercises the AVX-512 dispatch + # arm (`#[cfg(target_feature = "avx512f")]` in `src/simd.rs`) so it + # doesn't bit-rot under the v3-default cargo config. + # + # `cargo check` rather than `cargo test`: GH-hosted `ubuntu-latest` + # runners may not have AVX-512 silicon (intermittent across SKUs), + # so a v4-baked binary would SIGILL at run time. `check` compiles + # the AVX-512 code path through the type checker + borrow checker + # + monomorphization without producing a runnable artifact — + # catches type mismatches and dispatch-arm holes that the v3 + # default never touches. + # + # The job-level `RUSTFLAGS` env overrides the global + # `RUSTFLAGS="-D warnings"` set at the top of this file so the v4 + # target-cpu actually applies. Without the override, `.cargo/ + # config-avx512.toml`'s rustflags would be ignored (env wins over + # config file in cargo's precedence). + runs-on: ubuntu-latest + name: tier4-avx512-check + env: + RUSTFLAGS: "-D warnings -Ctarget-cpu=x86-64-v4" + steps: + - uses: actions/checkout@v4 + - uses: dtolnay/rust-toolchain@stable + - uses: Swatinem/rust-cache@v2 + - name: cargo check (v4 / AVX-512 dispatch arm) + run: cargo check -p ndarray --features approx,serde,rayon + - name: cargo check (v4 / AVX-512 + hpc-extras) + run: cargo check -p ndarray --features approx,serde,rayon,hpc-extras + blas-msrv: runs-on: ubuntu-latest name: blas-msrv @@ -269,6 +301,7 @@ jobs: - tests - native-backend - hpc-stream-parallel + - tier4-avx512-check - miri - cross_test - cargo-careful From f9d127ade0d525188db1c56cd7eb6d73366e99bc Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 20 May 2026 12:30:55 +0000 Subject: [PATCH 2/2] =?UTF-8?q?fix(ci):=20Phase=206=20=E2=80=94=20split=20?= =?UTF-8?q?target=20rustflags=20from=20build-script=20rustflags?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous iteration of tier4-avx512-check set `RUSTFLAGS= "-Ctarget-cpu=x86-64-v4"` as a job-level env. That env applies to BOTH the target compilation AND host build scripts (`build.rs` artifacts cargo runs natively). On a GH-hosted runner without AVX-512 silicon, those v4-baked build scripts SIGILL during dep compilation — the job exited in 23 s before our own crate even started compiling. Fix: use `CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS` (the env form that's documented to apply only when cargo produces artifacts for that triple, NOT to host build scripts) plus explicit `--target=x86_64- unknown-linux-gnu` so cargo distinguishes host from target even when they share the triple. Result: v4 reaches our crate, baseline reaches build scripts. Cargo doc reference: https://doc.rust-lang.org/cargo/reference/config.html #targetrustflags — "These flags only apply to the final artifact, and won't affect dependencies." --- .github/workflows/ci.yaml | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index 4c2c9a37..75d085ac 100644 --- a/.github/workflows/ci.yaml +++ b/.github/workflows/ci.yaml @@ -185,23 +185,31 @@ jobs: # catches type mismatches and dispatch-arm holes that the v3 # default never touches. # - # The job-level `RUSTFLAGS` env overrides the global - # `RUSTFLAGS="-D warnings"` set at the top of this file so the v4 - # target-cpu actually applies. Without the override, `.cargo/ - # config-avx512.toml`'s rustflags would be ignored (env wins over - # config file in cargo's precedence). + # # Why `CARGO_TARGET__RUSTFLAGS` instead of plain `RUSTFLAGS`: + # + # The first iteration used `env: RUSTFLAGS: "-Ctarget-cpu=x86-64-v4"` + # and failed in ~23 s — RUSTFLAGS env applies to BOTH the target + # compilation AND host build scripts (`build.rs` artifacts that + # cargo runs natively). On a GH runner without AVX-512 silicon, + # those v4-baked build scripts SIGILL during the dep build. + # + # `CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS` is documented to + # apply only when cargo is producing artifacts for that triple, NOT + # to host build scripts. Combined with explicit `--target` (so cargo + # distinguishes host from target even when they're the same triple), + # this gives us "v4 for our crate, baseline for build scripts." runs-on: ubuntu-latest name: tier4-avx512-check env: - RUSTFLAGS: "-D warnings -Ctarget-cpu=x86-64-v4" + CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS: "-D warnings -Ctarget-cpu=x86-64-v4" steps: - uses: actions/checkout@v4 - uses: dtolnay/rust-toolchain@stable - uses: Swatinem/rust-cache@v2 - name: cargo check (v4 / AVX-512 dispatch arm) - run: cargo check -p ndarray --features approx,serde,rayon + run: cargo check --target=x86_64-unknown-linux-gnu -p ndarray --features approx,serde,rayon - name: cargo check (v4 / AVX-512 + hpc-extras) - run: cargo check -p ndarray --features approx,serde,rayon,hpc-extras + run: cargo check --target=x86_64-unknown-linux-gnu -p ndarray --features approx,serde,rayon,hpc-extras blas-msrv: runs-on: ubuntu-latest