Skip to content

PR-X2 Worker B: soa_struct! #[soa(pad_to_lanes = N)] field attribute#169

Merged
AdaWorldAPI merged 2 commits into
masterfrom
claude/pr-x2-pad-to-lanes
May 20, 2026
Merged

PR-X2 Worker B: soa_struct! #[soa(pad_to_lanes = N)] field attribute#169
AdaWorldAPI merged 2 commits into
masterfrom
claude/pr-x2-pad-to-lanes

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

PR-X2 Worker B per .claude/knowledge/pr-x2-design.md § "Worker decomposition" line 459. Adds the #[soa(pad_to_lanes = N)] field attribute to soa_struct! so SIMD-staged kernels can walk each padded field with one uniform N-lane loop — no scalar tail-case branch.

1 file, +231 / -19 vs master. Companion to Worker A (#168, merged at fb95cb32).

Macro surface

soa_struct! {
    pub struct Cells {
        #[soa(pad_to_lanes = 8)]
        pub palette: u8,
        pub label: u32,           // unpadded
    }
}

let mut c = Cells::new();
c.push(7, 100);
assert_eq!(c.len(), 1);           // logical row count
assert_eq!(c.palette.len(), 8);   // physical, rounded to lane 8
assert_eq!(c.label.len(), 1);     // unpadded: physical == logical

Padded tail is filled with <$ty as Default>::default() — caller-specified sentinels are out of scope (design § "Open questions Q4", deferred follow-up).

Implementation

  • Optional $(#[soa(pad_to_lanes = $pad:literal)])? per field in the macro_rules! head — stable since Rust 1.32.
  • Generated struct grows a private _logical_len: usize. len() / is_empty() return the semantic row count, independent of any field's lane padding.
  • push() dispatches per-field through internal @push_field arms:
    • padded arm: grows the Vec to (logical + 1).div_ceil(N) * N filling with Default::default(), then writes the value at [logical]
    • plain arm: the pre-PR-X2 Vec::push(val) call
    • Dispatch via macro_rules! tt-munching with literal-token separators (pad = $pad) so a single repetition handles both shapes.
  • Compile-time guard: const { assert!($pad > 0) } inside the padded arm — pad_to_lanes = 0 is rejected at expansion time, not runtime.
  • with_capacity(cap) reserves cap per field but does not pre-pad; padding happens lazily on push.
  • clear() resets _logical_len + .clear() per field. Re-pushing rebuilds padding from scratch.

Breaking change

len() no longer mirrors self.<field>.len() after direct field mutation — e.g. s.x.push(...) bypasses _logical_len and produces a stale logical count. The canonical entry point is the macro-generated push.

The existing macro_public_visibility_passthrough test was updated to use push (it had pushed directly into each field, which still works for visibility but no longer increments len()).

New tests (5)

  • pad_to_lanes_single_push_grows_to_lane — mixed cadence (lane 8 + lane 16 + unpadded)
  • pad_to_lanes_crosses_lane_boundary — 9 pushes against lane 8 → physical 16
  • pad_to_lanes_clear_resets_bothclear() round-trip
  • pad_to_lanes_uniform_cadence — all-padded variant
  • pad_to_lanes_with_capacity_empty — empty state invariants

Plus a # Example — #[soa(pad_to_lanes = N)] field attribute doctest on the soa_struct! macro itself.

Verified locally

  • cargo test -p ndarray --lib hpc::soa38 passed (was 33; +5 padded)
  • cargo test --doc -p ndarray hpc::soa14 passed (was 13; +1 padded doctest)
  • cargo fmt --check — clean
  • cargo clippy --features approx,serde,rayon -- -D warnings — clean

Out of scope

  • Caller-specified sentinel for the padded tail (design § "Open questions Q4") — deferred.
  • {field_name}_padded_len() convenience accessor (design § Worker decomposition line 459) — skipped because fields are pub and self.<field>.len() already gives the physical length without ceremony.

🤖 Generated with Claude Code


Generated by Claude Code

Worker B of PR-X2 per .claude/knowledge/pr-x2-design.md § "Worker
decomposition" line 459. SIMD-staged kernels need each SoA field's
underlying Vec to be a multiple of the lane width N so the consumer
walks the buffer with one uniform N-lane loop — no scalar tail-case
branch. Pre-PR-X2 callers achieved this by hand (W3-W6
GaussianBatch::with_capacity + eager-zero fill); this PR makes it
declarative on the field.

Macro surface:

  soa_struct! {
      pub struct Cells {
          #[soa(pad_to_lanes = 8)]
          pub palette: u8,
          pub label: u32,           // unpadded
      }
  }

  let mut c = Cells::new();
  c.push(7, 100);
  assert_eq!(c.len(), 1);           // logical row count
  assert_eq!(c.palette.len(), 8);   // physical, rounded to lane 8
  assert_eq!(c.label.len(), 1);     // unpadded: physical == logical

Implementation:
  - Added optional `$(#[soa(pad_to_lanes = $pad:literal)])?` per field
    in the macro_rules! head — Rust 1.32+ optional-meta repetition.
  - Generated struct grows a private `_logical_len: usize` so `len()` /
    `is_empty()` return the **semantic** row count independent of any
    field's lane padding.
  - `push()` dispatches per-field through internal `@push_field` arms:
      • padded arm grows the Vec to `(logical+1).div_ceil(N)*N` filling
        with `<$ty as Default>::default()`, then writes the new value
        at `[logical]`
      • plain arm is the pre-PR-X2 `Vec::push` call
    The dispatch uses macro_rules! tt-munching with literal-token
    separators (`pad = $pad`) so a single repetition handles both shapes.
  - Compile-time guard: `const { assert!($pad > 0) }` inside the padded
    arm — `pad_to_lanes = 0` is rejected at expansion, not at runtime.
  - `with_capacity(cap)` reserves `cap` on each field but does NOT
    pre-pad; padding happens lazily on push (matches the original
    `with_capacity` semantics modulo the lane-tail).
  - `clear()` resets _logical_len + .clear() on each field. Re-pushing
    rebuilds padding from scratch.

Breaking change: `len()` no longer mirrors `self.<field>.len()` after
direct field mutation (e.g. `s.x.push(...)` bypasses `_logical_len`).
The canonical entry point is the macro-generated `push`. Pre-existing
`macro_public_visibility_passthrough` test updated to use `push`.

New tests (`src/hpc/soa.rs`, 5 added):
  - pad_to_lanes_single_push_grows_to_lane    — mixed cadence 8+16+none
  - pad_to_lanes_crosses_lane_boundary        — 9 pushes against lane 8
  - pad_to_lanes_clear_resets_both            — clear() round-trips
  - pad_to_lanes_uniform_cadence              — all-padded variant
  - pad_to_lanes_with_capacity_empty          — empty state invariants

Plus a `# Example — #[soa(pad_to_lanes = N)] field attribute` doctest
on the `soa_struct!` macro itself.

Verified:
  cargo test -p ndarray --lib hpc::soa                          38 passed
  cargo test --doc -p ndarray hpc::soa                          14 passed
  cargo fmt --check                                              clean
  cargo clippy --features approx,serde,rayon -- -D warnings      clean
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ee172aa84

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/hpc/soa.rs
P1 codex review on PR #169: unconditional `_logical_len` injection
turned previously all-public generated structs into structs with a
private field, breaking downstream code that constructs them via
struct literals (`MyBatch { a: vec![], b: vec![] }`) and exhaustive
pattern matches — even for callers who never use `#[soa(pad_to_lanes
= N)]`.

Fix: split `soa_struct!` into two arms.

Arm 1 — unpadded (no `#[soa(...)]` anywhere): byte-for-byte the
pre-PR-X2 emit. No `_logical_len` field. `len()` reads field lengths
under `debug_assert`. Struct-literal construction works exactly as
before. macro_rules! tries this arm first; if any field carries a
`#[soa(pad_to_lanes = N)]` attribute, the no-attribute pattern fails
to match and the macro falls through to arm 2.

Arm 2 — padded (at least one `#[soa(...)]`): the new PR-X2 B emit
with `_logical_len` + lane-padded `push`. Reached only when the
user opted in by tagging a field — struct-literal construction was
never going to work for these anyway because the user can't fill
the lane-padded `Vec` slots themselves.

The pre-existing `macro_public_visibility_passthrough` test is
restored to its original form (direct `s.x.push(...)` on the
unpadded `Soa3`) since arm 1 is byte-identical to pre-PR-X2.

Verified:
  cargo test -p ndarray --lib hpc::soa                          38 passed
  cargo test --doc -p ndarray hpc::soa                          14 passed
  cargo fmt --check                                              clean
  cargo clippy --features approx,serde,rayon -- -D warnings      clean

Resolves PR #169 P1 review thread r3272307622.
Copy link
Copy Markdown
Owner Author

@codex review

P1 thread r3272307622 addressed in 9d492db5 by splitting soa_struct! into two arms — arm 1 (no #[soa(...)] anywhere) is byte-for-byte the pre-PR-X2 emit so struct-literal construction stays sound for existing callers; arm 2 (any padding) emits _logical_len only for opt-in users. Please audit the new arm dispatch.


Generated by Claude Code

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@AdaWorldAPI AdaWorldAPI merged commit 94494bf into master May 20, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants