feat(hpc): l1/l2/linf SIMD kernels + stability docs (Sprint 0)#160
Merged
Conversation
…PI ndarray fork Two new docs codifying this fork's role as the canonical contract-shape example for the four-repo integration (per .claude/plans/integration-plan.md §5). * docs/hpc-stability.md (902 lines, ND-1) The frozen public surface, what "frozen" means, the additive pattern for new kernels (e.g. F32x16, int8, Hamming as new symbols next to existing ones), the diamond-dep guard contract with surrealdb's [patch.crates-io] block, and the aspirational cross-arch CI matrix. * docs/hpc-api-inventory.md (363 lines, ND-2) Catalogue of the existing public HPC surface with file:line citations. Confirms F64x8 + heel_f64x8::cosine_f64_simd are present and stable. PP-15 baton-handoff-auditor finding from this wave: The plan §5 promised stable signatures for `heel_f64x8::l1_f64_simd`, `l2_f64_simd`, and `linf_f64_simd` — both workers independently confirmed those three functions are ABSENT from the source today. Documented as "aspirational reserved names with intended signatures" in hpc-stability.md so the contract-shape promise is honest; adding the missing functions is Sprint 0a (out of scope of this wave). Workers: ND-1, ND-2. Sprint 0 of the four-repo wave (docs only; no source changes).
…l_f64x8 Resolves the PP-15 baton-handoff finding from wave-1: the integration plan §5 stable-surface table promised these three functions, but the ND-1 + ND-2 audits confirmed only cosine_f64_simd existed in source. Three "aspirational reserved names" → three real public APIs. Implementation matches cosine_f64_simd's pattern exactly: - debug_assert_eq! length guard - n / 8 F64x8 chunks via from_slice - l1: diff.abs() + acc accumulation, reduce_sum - l2: diff.mul_add(diff, acc) (FMA square), reduce_sum().sqrt() - linf: simd_max over abs, reduce_max - scalar tail loop for remainder Tests (l1_l2_linf_tests, 3 new): - l1_zero_for_equal_inputs — L1 of equal slices is exactly 0.0 - l2_matches_scalar_reference — within 100 * f64::EPSILON of scalar - linf_picks_the_largest_gap — Linf of [0,0,5,0] vs zeros is 5.0 All 15 heel_f64x8 tests pass (12 prior + 3 new). docs/hpc-stability.md in this same branch now describes real APIs, not promises. Worker: W-ND-3. Wave-2 of the four-repo integration.
Addresses PP-13 HOLD finding: docs/hpc-stability.md framed l1_f64_simd/l2_f64_simd/linf_f64_simd as "aspirational reserved names" because they were absent from source at wave-1 time. Wave-2 commit 71cdbd4 materialised all three; the doc now opens with a prepended status block stating the freeze commitment is load-bearing, not aspirational. The "Stable public surface" table below the banner now describes REAL APIs. Per PP-13 savant audit. Wave-3 of the four-repo integration.
AdaWorldAPI
added a commit
that referenced
this pull request
May 19, 2026
Revert #160 — HPC stability docs + l1/l2/linf kernels (four-repo arc)
AdaWorldAPI
pushed a commit
that referenced
this pull request
May 19, 2026
…vert note Closes the architectural synthesis arc with three additions to the consolidation doc + one companion flex prompt: 1. Four-tier picture (Cognitive / Analytic / Search / Graph): three of four legacy Bardioc layers have pre-existing Rust-native successors (Databend, Tantivy, lance-graph) that aren't HHTL. HHTL only has to win the cognitive layer it was designed for. Migration scope shrinks proportionally. 2. "Why we don't transcode ClickHouse" section: full transcode is 5-10 engineer-years (TiKV / Servo / CockroachDB reference points). Three cheaper escape hatches enumerated; path C (adopt Databend + ndarray::simd) recommended over path A (FFI inject) or path B (executor-only transcode). C# RavenDB / EventStoreDB ecosystem analog noted. 3. PR #404 reference updated to reflect 2026-05-19 rollback: code attempt withdrawn, architectural intent preserved as next-cycle target. Companion flex prompt: databend-ndarray-simd-prompt.md. 24-hour budget (half the trojan-horse prompt since Databend is already Rust-native, no FFI bridge). Three-engine benchmark target (stock ClickHouse + stock Databend + ndarray-Databend) against TPC-H + ClickBench + cognitive mini-workload. Sits at path C in the four-prompt strategic arc: 1. bardioc-weekend-rebuild (baseline) — measure honest legacy 2. stack-consolidation (this doc) — strategic frame 3. ndarray-simd-trojan-horse (path A) — FFI inject ClickHouse + Tantivy 4. databend-ndarray-simd (path C, this prompt) — adopt Rust-native successor No code changes; pure strategy docs. Branch already in master via PR #159 merge (not affected by #160 / #161 revert chain).
AdaWorldAPI
pushed a commit
that referenced
this pull request
May 19, 2026
…vert note Closes the architectural synthesis arc with three additions to the consolidation doc + one companion flex prompt: 1. Four-tier picture (Cognitive / Analytic / Search / Graph): three of four legacy Bardioc layers have pre-existing Rust-native successors (Databend, Tantivy, lance-graph) that aren't HHTL. HHTL only has to win the cognitive layer it was designed for. Migration scope shrinks proportionally. 2. "Why we don't transcode ClickHouse" section: full transcode is 5-10 engineer-years (TiKV / Servo / CockroachDB reference points). Three cheaper escape hatches enumerated; path C (adopt Databend + ndarray::simd) recommended over path A (FFI inject) or path B (executor-only transcode). C# RavenDB / EventStoreDB ecosystem analog noted. 3. PR #404 reference updated to reflect 2026-05-19 rollback: code attempt withdrawn, architectural intent preserved as next-cycle target. Companion flex prompt: databend-ndarray-simd-prompt.md. 24-hour budget (half the trojan-horse prompt since Databend is already Rust-native, no FFI bridge). Three-engine benchmark target (stock ClickHouse + stock Databend + ndarray-Databend) against TPC-H + ClickBench + cognitive mini-workload. Sits at path C in the four-prompt strategic arc: 1. bardioc-weekend-rebuild (baseline) — measure honest legacy 2. stack-consolidation (this doc) — strategic frame 3. ndarray-simd-trojan-horse (path A) — FFI inject ClickHouse + Tantivy 4. databend-ndarray-simd (path C, this prompt) — adopt Rust-native successor No code changes; pure strategy docs. Branch already in master via PR #159 merge (not affected by #160 / #161 revert chain).
AdaWorldAPI
pushed a commit
that referenced
this pull request
May 19, 2026
Two layered updates from the post-PR #404-rollback session, both folded into the existing consolidation + PR-X10 docs so PR #162 carries them. ## Layer 1 — PR #404 / PR #160 rollback salvage ### `heel_f64x8::{l1,l2,linf}_f64_simd` → PR-X10 A6 `linalg::distance` The distance kernels were correct; the framing was wrong (filed as "Sprint 0a of a four-repo integration arc" with cross-repo coupling that made the rollback inherently cross-repo). Re-emerges as `ndarray::hpc::linalg::distance::{l1,l2,linf}_f64_simd` under worker A6. - `pr-x10-linalg-core-design.md`: added `distance.rs` to the module tree, new section "Distance kernels — `linalg::distance`" with API surface + precision class, A6 worker row updated (~400 → ~500 LoC, files now include `linalg/distance.rs`) - `stack-consolidation-bardioc-to-hhtl.md`: new "Salvage from the 2026-05-19 cross-repo rollback (PR #404 / PR #160)" section names the re-entry point + lesson for future cross-repo arcs ### `lance-graph-contract::{ir, provider, actor}` → mostly dead, except… `Operator`, `Cardinality`, `EngineHint`, `MvccProvider` types are correctly dead (HHTL covers natively). **Exception**: `SupervisableShader` + `RestartBackoff` reserved as *column-flip-cycle commitment-gate primitives* for a future PR-X14 in lance-graph. They wrap Ractor handlers that own a column-flip cycle (read column → compute → flip state-flag → reply / drop). Re-framed below under the column-substrate-identity model — they encode flip-cycle semantics, not cross-store boundary plumbing. ## Layer 2 — Column-substrate identity (the deeper reframe) The post-rollback session also produced an architectural collapse that supersedes parts of the existing consolidation doc. Encoded across three sections: ### New § "Column-substrate identity — Lance ≡ Arrow ≡ ndarray SoA" One physical representation, end-to-end. Lance column ≡ Arrow column buffer ≡ ndarray SoA — same bytes viewed through three names. Every dialect surface (lance-graph cascade, SurrealDB, sea-orm, Databend, Tantivy) parses its query language down to operations on those same bytes. ndarray pays for the SIMD primitive once; the whole stack collects rent. Rubicon = *column-state flip*, not write event. A Thought is a Lance row from allocation to query by any surface. "Crossing the Rubicon" means flipping (e.g.) `committed: false → true` — versioned natively by Lance, observed by any LIVE watcher with a matching predicate, no serialisation. Section includes: - The full Lance/Arrow/ndarray-SoA diagram with the five dialect surfaces - "What this dissolves" table — 7 earlier framings now superseded (mailbox writes, MvccProvider threading, surrealdb-ractor cf-event, sea-orm entity-actor dispatch, Zone-as-storage-tier, TiKV-as-routing, kv-lance-as-translation) - "What survives — JITson / Cranelift, cleaner than before" — the compile-time → JIT pipeline (DeriveEntityModel → Cranelift kernel specialisation against OGIT-derived column types; ontology evolution triggers next compile cycle → all surfaces auto-inherit) - "Implication for the four-tier picture" — the substrate claim becomes load-bearing in the right way; column IS the SoA IS the ndarray buffer ### Zone-model section rewritten Zones are now defined as **temporal phases of column state on a single Lance dataset**, not storage tiers. Table columns: column-state phase (`committed=false` / `committed=true` / `egressed_at IS NOT NULL`), which surface watches each phase, what "being in this zone" means. Same physical bytes throughout — a row does not "move" from zone 1 to zone 2; a column flips and the LIVE watchers notice. Section ends pointing at § "Column-substrate identity" for the full unification. ### Click-moments inventory: three → four Click-moment #2 (Ractor `&mut self`) gets a refinement note about mailbox-cycle Rubicon (no physical boundary). New click-moment #4 — "Multi-store consistency / cross-zone messaging looked like the hard coordination problem → column-substrate identity shows there is no cross-zone messaging." Concluding paragraph distinguishes the three workload-shape dissolutions (#1-3) from the substrate-identity dissolution (#4) which makes the others' "no copy, no marshal, no coordination" claims literal. ### Salvage section's SupervisableShader framing updated The earlier "zone-1↔zone-2 boundary" language was already wrong twice in this PR; final framing under column-substrate identity: these are column-flip-cycle commitment primitives. Lance's version chain provides the natural retry semantics. The handler's "supervision boundary" is the flip-cycle, not a perimeter — because there is no second store. ## Status after this commit - PR #162 now carries: Phase 2 entry artifacts (canary plan + execution prompt + PR-X10 verdict patch P1-1) AND PR #404 rollback salvage AND column-substrate-identity reframe - All four click-moments documented; framing across Zone model, Click-moments inventory, and Salvage section is consistent - PR-X10 A6 absorbs heel_f64x8 distance kernels with bench parity gate - Re-entry path for SupervisableShader + RestartBackoff named (future PR-X14 in lance-graph; first consumer is the NARS-revision handler that flips `revised: false → true` per column-flip semantics)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Sprint 0 + Sprint 0a of the four-repo integration plan (lance-graph ↔ surrealdb ↔ sea-orm ↔ ndarray). This repo provides the shared low-level numeric substrate (SIMD distance kernels) consumed by surrealdb-core and the lance-graph cognitive crates. The contract-shape commitment here is absolute: every symbol promised in plan §5 must exist in compile-checked source, not as an aspirational reserved name.
docs/hpc-stability.md(902 lines) — codifies the frozen public surface (F64x8+heel_f64x8::*), what "frozen" means (no signature change, no rename, no semantic drift), the additive pattern for new kernels, and the diamond-dep guard contract with surrealdb's[patch.crates-io]block.docs/hpc-api-inventory.md(363 lines) — independent catalogue of the existing public HPC surface with file:line citations. Surfaced the PP-15 gap (see below).src/hpc/heel_f64x8.rs— three new SIMD distance kernels added:l1_f64_simd,l2_f64_simd,linf_f64_simd. These were promised in plan §5 as "signature frozen" but ND-1+ND-2 audits confirmed they were absent at wave-1 time. Wave-2 materialised all three matchingcosine_f64_simd's F64x8-chunk + scalar-tail pattern.Commits in this PR (oldest first)
6d5bfd8f4bdc9315c7bd971cdbd4b807075Coordinated PRs in sibling repos
claude/lance-surrealdb-analysis-LXmugclaude/lance-surrealdb-analysis-LXmugclaude/lance-surrealdb-analysis-LXmugPP-15 baton-handoff finding → resolved
The PP-15 baton-handoff-auditor savant in wave-1 flagged that
heel_f64x8::l1_f64_simd / l2_f64_simd / linf_f64_simdwere promised in plan §5 but absent from source. This created a "stability document that freezes non-existent symbols" misleading-doc anti-pattern. Resolved in wave-2 commit71cdbd4(functions materialised) + wave-3 commitb807075(stability doc prepended with status banner). The stable public surface table now describes REAL APIs, not promises.Test plan
cargo test --lib -p ndarray heel_f64x8— 15/15 tests pass (12 existing + 3 new forl1_zero_for_equal_inputs,l2_matches_scalar_reference,linf_picks_the_largest_gap)[patch.crates-io] ndarray = { git = "..." }block still routes through this fork's 0.17 linelance-indexfor the 0.16 → 0.17 ndarray bump (out of scope)API stability commitment (load-bearing)
This repo is the canonical contract-shape example of the four-repo integration. Every symbol the other three repos depend on must hold its signature. The commitment in
docs/hpc-stability.mdis absolute: no signature change, no rename, no semantic drift. New kernels arrive as new functions next to existing ones (e.g., a hypothetical FMA cosine variant would land ascosine_f64_simd_fma, withcosine_f64_simdunchanged).https://claude.ai/code/session_01LiUiGeUDLje8KMnxB4FfA3
Generated by Claude Code