impl(sprint-12/wave-F): D-CSV-11 vertical streaming scaffolds (QualiaStream + InferenceStream + SplatFieldStream)#147
Conversation
Per user-requested clippy-first + ephemeral-nightly discipline. The repo keeps stable as its only toolchain (rust-toolchain.toml = 1.95.0); nightly exists ONLY for Miri because (a) Miri ships nightly-only, (b) the SIMD polyfill at src/simd_nightly/ requires #![feature(portable_simd)], and (c) Miri can execute core::simd but treats _mm*_* intrinsics as opaque. Changes: 1. scripts/miri-tests.sh — rewritten to make the ephemeral-nightly discipline explicit. Invokes `cargo +nightly miri nextest run` (the +nightly is a per-invocation switch — default toolchain stays stable). Idempotent `rustup component add miri --toolchain nightly` so fresh checkouts auto-install. Now passes `--features approx,serde,nightly-simd` so the polyfill paths (not the intrinsics) are actually exercised under Miri. Header comment names the rule: this is the only nightly entry point in the repo. 2. src/hpc/simd_caps.rs — `cfg(miri)` variant of `SimdCaps::detect` that returns an all-false capability set, bypassing the `__cpuid_count` inline-asm call (which Miri cannot execute — it panics with "unsupported operation: inline assembly is not supported"). Production builds and stable CI are unaffected; only `cargo +nightly miri test` takes the new path. Result: any test that reaches the `simd_caps()` LazyLock under Miri now exercises the scalar fallback paths instead of aborting. Verification (cargo +nightly miri test ephemerally): - simd_nightly::tests::* — 153 passed / 0 failed (26s). - hpc::byte_scan::tests::* — 18 passed / 0 failed (15s). - `cargo check -p ndarray` on stable (default toolchain) still passes clean — no regression on the production path. If Miri stays clean over the next sprint, the matching CI job at .github/workflows/ci.yaml § miri can be promoted from optional (only on merge_group / push to main) → required on PR checks.
The previous sweep aborted at test 93/2407 because `hpc::activations` (and many other hpc::* modules) import `crate::simd::F32x16` directly, which re-exports from `simd_avx2::*` / `simd_avx512::*` — code paths that call `_mm256_set1_ps` and similar intrinsics. Miri rejects these with "calling a function that requires unavailable target features: avx" because Miri's compilation target does NOT enable AVX/AVX2/AVX-512 target features. Architectural finding (already documented in src/simd.rs:212): the `nightly-simd` feature ships a parallel `crate::simd_nightly` module (the 30-type `core::simd` polyfill) which IS Miri-checkable, but the default `crate::simd::*` dispatch is NOT routed through it. Consumer modules importing `crate::simd::F32x16` go through intrinsics. Until the polyfill expands to full type-parity (5/30 today) AND `crate::simd::*` gains a cfg(miri) dispatch through it, this script constrains the sweep to the surface that does not pull `_mm*` intrinsics: * `simd_nightly::tests::*` — the polyfill itself (153 tests, clean) * `hpc::byte_scan::*` — exercises scalar fallback through SimdCaps * `hpc::framebuffer::*` — same * everything else not in `hpc::*` `--no-fail-fast` so a single regression doesn't mask the rest.
…amid The previous filter only excluded `hpc::*` (minus byte_scan / framebuffer). A test run surfaced two more boundaries: 1. `simd::tests::*` (the suite under src/simd.rs) exercises `crate::simd::F32x16` directly — same intrinsics-class as `hpc::*`. All 9 simd::tests failures hit "calling a function that requires unavailable target features: avx". Same architectural finding as documented in the file header. 2. `hpc::framebuffer::pyramid_tests::*` (3 tests) ran for 19+ minutes each under Miri without finishing. Not a UB signal — large 2D scan loops over SIMD-shaped data hitting Miri's interpreter overhead. Both excluded. Filter now reads as 3 ANDed clauses (formatted across multiple lines + inline comments) so the next reader can map each exclusion to its reason without git-archaeology.
…l parity The note in src/simd.rs (and the matching paragraph in scripts/miri-tests.sh) was written against an early draft of simd_nightly that defined 5 types: F32x16, F64x8, U8x64, U32x16, F32Mask16. PR #146 expanded the polyfill to full parity: simd_nightly: 24 types simd_avx512 + simd_avx2: 24 types (F32x8/16, F64x4/8, BF16x8/16, F16x16, I8x32/64, I16x16/32, I32x16, I64x8, U8x32/64, U16x32, U32x8/16, U64x4/8, plus the F32/F64 mask types — `grep '^pub struct ' src/simd_nightly/*.rs | grep -v _original_draft | sort -u | wc -l` confirms.) `src/simd_nightly/_original_draft.rs` survives on disk as the early 5-type sketch but is NOT in `simd_nightly/mod.rs` — dead-file, not compiled. Separate janitorial concern (file deletion); the comment correction lands here. The architectural follow-up for Miri-clean `hpc::*` coverage is NOT polyfill expansion — that work is done. It's a cfg(miri) switch in `src/simd.rs` that re-exports from `simd_nightly` instead of `simd_avx*` when Miri is the target. Comment rewritten to say so.
…Stream + InferenceStream + SplatFieldStream)
Cross-repo companion to lance-graph PR (sprint-12 Wave F fleet).
Three new forward-iterator scaffolds per cognitive-substrate-
convergence-v1.md §5 L-20: vertical streaming over the SoA columns
to enable the i4 hot-path sweep introduced by lance-graph PR #383
(causal-edge v2) + #384 (QualiaI4_16D) + #387 (MUL i4 evaluation).
W-F4 — QualiaStream
- NEW `src/hpc/stream/qualia.rs` (~185 LOC + 6 tests)
- `QualiaI4Row(pub u64)` — bit-compatible local mirror of
`lance_graph_contract::qualia::QualiaI4_16D` (no cross-crate import
to avoid producer↔consumer circular dep; documented as
intentional in lance-graph TYPE_DUPLICATION_MAP)
- `QualiaStream<'a>` forward-iterator over `&[QualiaI4Row]`,
yielding `(usize, &QualiaI4Row)` tuples
- Iterator + ExactSizeIterator impls; `new`/`len`/`is_empty`/
`remaining`/`reset`
W-F5 — InferenceStream
- NEW `src/hpc/stream/inference.rs` (~223 LOC + 6 tests)
- `InferenceRow(pub u64)` — bit-compatible with
`causal_edge::CausalEdge64` v2 layout (bits 46-49 signed
mantissa, bits 53-58 W-slot 6b)
- `inference_mantissa() -> i8` via `(raw << 4) >> 4` arithmetic-
shift sign-extension (matches causal-edge's accessor)
- `w_slot() -> u8` via bits-53..58 mask
- `InferenceStream<'a>` Iterator + ExactSizeIterator
W-F6 — SplatFieldStream
- NEW `src/hpc/stream/splat_field.rs` (~240 LOC + 6 tests)
- `SplatField { mean: u32, variance: f32, energy: f32,
generation: u32 }` `#[repr(C, align(16))]` — bit-compatible
with `thinking_engine::splat_ops::SplatField` from W-F7
- `SplatFieldStream<'a>` Iterator + `filter_energy_above(threshold)`
combinator returning `impl Iterator` (no allocation)
`src/hpc/mod.rs` — added `pub mod stream;` declaration.
`src/hpc/stream/mod.rs` — registers all three submodules + re-exports:
`InferenceRow / InferenceStream`, `QualiaI4Row / QualiaStream`,
`SplatField / SplatFieldStream`.
This commit resolves CSI-9 from the W-Meta-Opus honest review:
the original W-F4 and W-F6 worker outputs were file-creating but
mod.rs-orphan; only W-F5 had registered itself. Companion lance-
graph commit pushes the parent meta-review + W-F2/W-F3/W-F1
registration fixes (CSI-7/8).
Test status: `cargo test --lib hpc::stream` — **18/18 pass** (6
each × 3 streams). No regressions on the rest of the hpc suite.
Sprint-13+: `par_*` rayon variants once rayon is wired into the
ndarray feature gate (existing TECH_DEBT note).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2a1a1e3828
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /// # Example | ||
| /// | ||
| /// ``` | ||
| /// use crate::hpc::stream::qualia::{QualiaI4Row, QualiaStream}; |
There was a problem hiding this comment.
Use the crate name in doctest imports
When cargo test runs doctests, examples are compiled as an external crate, so crate::hpc::... refers to the generated test crate rather than ndarray. This import therefore fails with could not find hpc in the crate root; the same pattern in src/hpc/stream/splat_field.rs has the same problem. Use ndarray::hpc::... here as in the new InferenceStream example.
Useful? React with 👍 / 👎.
Wave F stream scaffolds (qualia.rs / inference.rs / splat_field.rs) needed rustfmt 1.95 reformatting per the CI fmt gate. Same family of single-line / multi-line layout adjustments that hit lance-graph PRs #383 / #384 / #386. Pure mechanical reformat via `cargo fmt`. No behavior change; 18/18 stream tests still pass.
…(codex P2) Codex P2 flagged that doctest examples compile as an external test crate, so `crate::hpc::...` doesn't resolve — `cargo test --doc` would fail with "could not find hpc in the crate root". W-F5's InferenceStream got this right; W-F4 (QualiaStream) and W-F6 (SplatFieldStream) used `crate::hpc::...` and would have broken doctest CI. Two doctest import paths corrected to use the external crate name `ndarray::hpc::...` (matching the W-F5 pattern).
Summary
Cross-repo companion to lance-graph sprint-12 Wave F fleet. Three new forward-iterator scaffolds per
cognitive-substrate-convergence-v1.md§5 L-20: vertical streaming over the SoA columns to enable the i4 hot-path sweep landed in lance-graph PRs #383 (causal-edge v2) / #384 (QualiaI4_16D) / #387 (MUL i4 evaluation).What's new
src/hpc/stream/qualia.rsQualiaI4Row+QualiaStream<'a>lance_graph_contract::qualia::QualiaI4_16D— no cross-crate import to avoid producer↔consumer circular depsrc/hpc/stream/inference.rsInferenceRow+InferenceStream<'a>causal_edge::CausalEdge64v2 layout.inference_mantissa() -> i8via arithmetic-shift sign-extension;w_slot() -> u8via bits-53..58 masksrc/hpc/stream/splat_field.rsSplatField+SplatFieldStream<'a>#[repr(C, align(16))]16-byte struct (mean / variance / energy / generation);filter_energy_abovecombinatorAll three:
new/len/is_empty/remaining/reset+Iterator+ExactSizeIteratorimpls. Forward-only scaffolds;par_*rayon variants deferred to sprint-13+.src/hpc/mod.rs— addspub mod stream;.src/hpc/stream/mod.rs— registers all three submodules + re-exports.Why bit-compatible local mirrors (not direct imports)
ndarray is the producer layer;
lance-graph-contractandcausal-edgeare consumers. Importing those crates here would create a circular dep through the workspace. Instead the local types match the bit layout exactly (documented in lance-graph'sdocs/TYPE_DUPLICATION_MAP.mdas intentional, with the rule that any layout change on the consumer side requires a paired change here).CSI-9 from the W-Meta-Opus honest review
W-F4 + W-F6 originally left
mod.rsorphan (only W-F5 self-registered) — Opus surfaced this as CSI-9. This commit closes that gap.Test plan
cargo test --lib hpc::stream— 18/18 pass (6 each × 3 streams)Cross-refs
claude/sprint-12-wave-f-fleet(open inAdaWorldAPI/lance-graph)lance-graph/.claude/plans/cognitive-substrate-convergence-v1.md§5 L-20, §11 D-CSV-11lance-graph/docs/TYPE_DUPLICATION_MAP.md(Wave F additions)lance-graph/.claude/board/sprint-log-11/meta-review-opus.md🤖 Generated with Claude Code
Generated by Claude Code