Skip to content

feat(hpc): l1/l2/linf SIMD kernels + stability docs (Sprint 0)#160

Merged
AdaWorldAPI merged 6 commits into
masterfrom
claude/lance-surrealdb-analysis-LXmug
May 19, 2026
Merged

feat(hpc): l1/l2/linf SIMD kernels + stability docs (Sprint 0)#160
AdaWorldAPI merged 6 commits into
masterfrom
claude/lance-surrealdb-analysis-LXmug

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Sprint 0 + Sprint 0a of the four-repo integration plan (lance-graph ↔ surrealdb ↔ sea-orm ↔ ndarray). This repo provides the shared low-level numeric substrate (SIMD distance kernels) consumed by surrealdb-core and the lance-graph cognitive crates. The contract-shape commitment here is absolute: every symbol promised in plan §5 must exist in compile-checked source, not as an aspirational reserved name.

  • docs/hpc-stability.md (902 lines) — codifies the frozen public surface (F64x8 + heel_f64x8::*), what "frozen" means (no signature change, no rename, no semantic drift), the additive pattern for new kernels, and the diamond-dep guard contract with surrealdb's [patch.crates-io] block.
  • docs/hpc-api-inventory.md (363 lines) — independent catalogue of the existing public HPC surface with file:line citations. Surfaced the PP-15 gap (see below).
  • src/hpc/heel_f64x8.rs — three new SIMD distance kernels added: l1_f64_simd, l2_f64_simd, linf_f64_simd. These were promised in plan §5 as "signature frozen" but ND-1+ND-2 audits confirmed they were absent at wave-1 time. Wave-2 materialised all three matching cosine_f64_simd's F64x8-chunk + scalar-tail pattern.

Commits in this PR (oldest first)

Commit Description
6d5bfd8 plans: integration plan for ndarray's role in the four-repo convergence
f4bdc93 plans: codify additive-only API stability commitment as this repo's contract
15c7bd9 docs(hpc): add stability commitment + API inventory for the AdaWorldAPI ndarray fork
71cdbd4 feat(hpc): materialise l1_f64_simd, l2_f64_simd, linf_f64_simd in heel_f64x8
b807075 docs(hpc): mark l1/l2/linf as materialised (wave-2 update prepended)

Coordinated PRs in sibling repos

Repo PR
AdaWorldAPI/lance-graph claude/lance-surrealdb-analysis-LXmug
AdaWorldAPI/surrealdb claude/lance-surrealdb-analysis-LXmug
AdaWorldAPI/sea-orm claude/lance-surrealdb-analysis-LXmug

PP-15 baton-handoff finding → resolved

The PP-15 baton-handoff-auditor savant in wave-1 flagged that heel_f64x8::l1_f64_simd / l2_f64_simd / linf_f64_simd were promised in plan §5 but absent from source. This created a "stability document that freezes non-existent symbols" misleading-doc anti-pattern. Resolved in wave-2 commit 71cdbd4 (functions materialised) + wave-3 commit b807075 (stability doc prepended with status banner). The stable public surface table now describes REAL APIs, not promises.

Test plan

  • cargo test --lib -p ndarray heel_f64x8 — 15/15 tests pass (12 existing + 3 new for l1_zero_for_equal_inputs, l2_matches_scalar_reference, linf_picks_the_largest_gap)
  • Cross-arch: not blocked on this PR (Sprint 0 doc commitment includes an aspirational CI matrix; not yet wired)
  • Verified the diamond-dep guard contract: surrealdb's [patch.crates-io] ndarray = { git = "..." } block still routes through this fork's 0.17 line
  • Sprint 3+: F32x16 + int8 + Hamming kernels as new additive symbols (out of scope of this PR)
  • Track upstream lance-index for the 0.16 → 0.17 ndarray bump (out of scope)

API stability commitment (load-bearing)

This repo is the canonical contract-shape example of the four-repo integration. Every symbol the other three repos depend on must hold its signature. The commitment in docs/hpc-stability.md is absolute: no signature change, no rename, no semantic drift. New kernels arrive as new functions next to existing ones (e.g., a hypothetical FMA cosine variant would land as cosine_f64_simd_fma, with cosine_f64_simd unchanged).

https://claude.ai/code/session_01LiUiGeUDLje8KMnxB4FfA3


Generated by Claude Code

AdaWorldAPI and others added 6 commits May 18, 2026 13:09
…PI ndarray fork

Two new docs codifying this fork's role as the canonical contract-shape
example for the four-repo integration (per .claude/plans/integration-plan.md §5).

* docs/hpc-stability.md (902 lines, ND-1)
  The frozen public surface, what "frozen" means, the additive pattern
  for new kernels (e.g. F32x16, int8, Hamming as new symbols next to
  existing ones), the diamond-dep guard contract with surrealdb's
  [patch.crates-io] block, and the aspirational cross-arch CI matrix.

* docs/hpc-api-inventory.md (363 lines, ND-2)
  Catalogue of the existing public HPC surface with file:line citations.
  Confirms F64x8 + heel_f64x8::cosine_f64_simd are present and stable.

PP-15 baton-handoff-auditor finding from this wave:
The plan §5 promised stable signatures for `heel_f64x8::l1_f64_simd`,
`l2_f64_simd`, and `linf_f64_simd` — both workers independently
confirmed those three functions are ABSENT from the source today.
Documented as "aspirational reserved names with intended signatures"
in hpc-stability.md so the contract-shape promise is honest;
adding the missing functions is Sprint 0a (out of scope of this wave).

Workers: ND-1, ND-2. Sprint 0 of the four-repo wave (docs only;
no source changes).
…l_f64x8

Resolves the PP-15 baton-handoff finding from wave-1: the integration
plan §5 stable-surface table promised these three functions, but the
ND-1 + ND-2 audits confirmed only cosine_f64_simd existed in source.
Three "aspirational reserved names" → three real public APIs.

Implementation matches cosine_f64_simd's pattern exactly:
- debug_assert_eq! length guard
- n / 8 F64x8 chunks via from_slice
- l1: diff.abs() + acc accumulation, reduce_sum
- l2: diff.mul_add(diff, acc) (FMA square), reduce_sum().sqrt()
- linf: simd_max over abs, reduce_max
- scalar tail loop for remainder

Tests (l1_l2_linf_tests, 3 new):
- l1_zero_for_equal_inputs — L1 of equal slices is exactly 0.0
- l2_matches_scalar_reference — within 100 * f64::EPSILON of scalar
- linf_picks_the_largest_gap — Linf of [0,0,5,0] vs zeros is 5.0

All 15 heel_f64x8 tests pass (12 prior + 3 new). docs/hpc-stability.md
in this same branch now describes real APIs, not promises.

Worker: W-ND-3. Wave-2 of the four-repo integration.
Addresses PP-13 HOLD finding: docs/hpc-stability.md framed
l1_f64_simd/l2_f64_simd/linf_f64_simd as "aspirational reserved
names" because they were absent from source at wave-1 time. Wave-2
commit 71cdbd4 materialised all three; the doc now opens with a
prepended status block stating the freeze commitment is load-bearing,
not aspirational. The "Stable public surface" table below the
banner now describes REAL APIs.

Per PP-13 savant audit. Wave-3 of the four-repo integration.
CI `format/stable` job flagged `cargo fmt --check` diff: the new
`linf_picks_the_largest_gap` test (commit 71cdbd4) had a multi-line
`assert!(...)` invocation. rustfmt prefers it on one line. Auto-applied
via `cargo fmt`.

No code change. CI signal: PR #160 format/stable.
@AdaWorldAPI AdaWorldAPI merged commit 697fb96 into master May 19, 2026
15 checks passed
AdaWorldAPI added a commit that referenced this pull request May 19, 2026
Revert #160 — HPC stability docs + l1/l2/linf kernels (four-repo arc)
AdaWorldAPI pushed a commit that referenced this pull request May 19, 2026
…vert note

Closes the architectural synthesis arc with three additions to the
consolidation doc + one companion flex prompt:

1. Four-tier picture (Cognitive / Analytic / Search / Graph): three of
   four legacy Bardioc layers have pre-existing Rust-native successors
   (Databend, Tantivy, lance-graph) that aren't HHTL. HHTL only has to
   win the cognitive layer it was designed for. Migration scope shrinks
   proportionally.

2. "Why we don't transcode ClickHouse" section: full transcode is 5-10
   engineer-years (TiKV / Servo / CockroachDB reference points). Three
   cheaper escape hatches enumerated; path C (adopt Databend +
   ndarray::simd) recommended over path A (FFI inject) or path B
   (executor-only transcode). C# RavenDB / EventStoreDB ecosystem
   analog noted.

3. PR #404 reference updated to reflect 2026-05-19 rollback: code
   attempt withdrawn, architectural intent preserved as next-cycle target.

Companion flex prompt: databend-ndarray-simd-prompt.md. 24-hour budget
(half the trojan-horse prompt since Databend is already Rust-native, no
FFI bridge). Three-engine benchmark target (stock ClickHouse + stock
Databend + ndarray-Databend) against TPC-H + ClickBench + cognitive
mini-workload. Sits at path C in the four-prompt strategic arc:
1. bardioc-weekend-rebuild (baseline) — measure honest legacy
2. stack-consolidation (this doc) — strategic frame
3. ndarray-simd-trojan-horse (path A) — FFI inject ClickHouse + Tantivy
4. databend-ndarray-simd (path C, this prompt) — adopt Rust-native successor

No code changes; pure strategy docs. Branch already in master via PR #159
merge (not affected by #160 / #161 revert chain).
AdaWorldAPI pushed a commit that referenced this pull request May 19, 2026
…vert note

Closes the architectural synthesis arc with three additions to the
consolidation doc + one companion flex prompt:

1. Four-tier picture (Cognitive / Analytic / Search / Graph): three of
   four legacy Bardioc layers have pre-existing Rust-native successors
   (Databend, Tantivy, lance-graph) that aren't HHTL. HHTL only has to
   win the cognitive layer it was designed for. Migration scope shrinks
   proportionally.

2. "Why we don't transcode ClickHouse" section: full transcode is 5-10
   engineer-years (TiKV / Servo / CockroachDB reference points). Three
   cheaper escape hatches enumerated; path C (adopt Databend +
   ndarray::simd) recommended over path A (FFI inject) or path B
   (executor-only transcode). C# RavenDB / EventStoreDB ecosystem
   analog noted.

3. PR #404 reference updated to reflect 2026-05-19 rollback: code
   attempt withdrawn, architectural intent preserved as next-cycle target.

Companion flex prompt: databend-ndarray-simd-prompt.md. 24-hour budget
(half the trojan-horse prompt since Databend is already Rust-native, no
FFI bridge). Three-engine benchmark target (stock ClickHouse + stock
Databend + ndarray-Databend) against TPC-H + ClickBench + cognitive
mini-workload. Sits at path C in the four-prompt strategic arc:
1. bardioc-weekend-rebuild (baseline) — measure honest legacy
2. stack-consolidation (this doc) — strategic frame
3. ndarray-simd-trojan-horse (path A) — FFI inject ClickHouse + Tantivy
4. databend-ndarray-simd (path C, this prompt) — adopt Rust-native successor

No code changes; pure strategy docs. Branch already in master via PR #159
merge (not affected by #160 / #161 revert chain).
AdaWorldAPI pushed a commit that referenced this pull request May 19, 2026
Two layered updates from the post-PR #404-rollback session, both folded into
the existing consolidation + PR-X10 docs so PR #162 carries them.

## Layer 1 — PR #404 / PR #160 rollback salvage

### `heel_f64x8::{l1,l2,linf}_f64_simd` → PR-X10 A6 `linalg::distance`

The distance kernels were correct; the framing was wrong (filed as
"Sprint 0a of a four-repo integration arc" with cross-repo coupling that
made the rollback inherently cross-repo). Re-emerges as
`ndarray::hpc::linalg::distance::{l1,l2,linf}_f64_simd` under worker A6.

- `pr-x10-linalg-core-design.md`: added `distance.rs` to the module tree,
  new section "Distance kernels — `linalg::distance`" with API surface +
  precision class, A6 worker row updated (~400 → ~500 LoC, files now
  include `linalg/distance.rs`)
- `stack-consolidation-bardioc-to-hhtl.md`: new "Salvage from the
  2026-05-19 cross-repo rollback (PR #404 / PR #160)" section names the
  re-entry point + lesson for future cross-repo arcs

### `lance-graph-contract::{ir, provider, actor}` → mostly dead, except…

`Operator`, `Cardinality`, `EngineHint`, `MvccProvider` types are
correctly dead (HHTL covers natively).

**Exception**: `SupervisableShader` + `RestartBackoff` reserved as
*column-flip-cycle commitment-gate primitives* for a future PR-X14 in
lance-graph. They wrap Ractor handlers that own a column-flip cycle
(read column → compute → flip state-flag → reply / drop). Re-framed
below under the column-substrate-identity model — they encode
flip-cycle semantics, not cross-store boundary plumbing.

## Layer 2 — Column-substrate identity (the deeper reframe)

The post-rollback session also produced an architectural collapse that
supersedes parts of the existing consolidation doc. Encoded across
three sections:

### New § "Column-substrate identity — Lance ≡ Arrow ≡ ndarray SoA"

One physical representation, end-to-end. Lance column ≡ Arrow column
buffer ≡ ndarray SoA — same bytes viewed through three names. Every
dialect surface (lance-graph cascade, SurrealDB, sea-orm, Databend,
Tantivy) parses its query language down to operations on those same
bytes. ndarray pays for the SIMD primitive once; the whole stack
collects rent.

Rubicon = *column-state flip*, not write event. A Thought is a Lance row
from allocation to query by any surface. "Crossing the Rubicon" means
flipping (e.g.) `committed: false → true` — versioned natively by Lance,
observed by any LIVE watcher with a matching predicate, no serialisation.

Section includes:
- The full Lance/Arrow/ndarray-SoA diagram with the five dialect surfaces
- "What this dissolves" table — 7 earlier framings now superseded
  (mailbox writes, MvccProvider threading, surrealdb-ractor cf-event,
  sea-orm entity-actor dispatch, Zone-as-storage-tier, TiKV-as-routing,
  kv-lance-as-translation)
- "What survives — JITson / Cranelift, cleaner than before" — the
  compile-time → JIT pipeline (DeriveEntityModel → Cranelift kernel
  specialisation against OGIT-derived column types; ontology evolution
  triggers next compile cycle → all surfaces auto-inherit)
- "Implication for the four-tier picture" — the substrate claim becomes
  load-bearing in the right way; column IS the SoA IS the ndarray buffer

### Zone-model section rewritten

Zones are now defined as **temporal phases of column state on a single
Lance dataset**, not storage tiers. Table columns: column-state phase
(`committed=false` / `committed=true` / `egressed_at IS NOT NULL`),
which surface watches each phase, what "being in this zone" means.
Same physical bytes throughout — a row does not "move" from zone 1 to
zone 2; a column flips and the LIVE watchers notice. Section ends
pointing at § "Column-substrate identity" for the full unification.

### Click-moments inventory: three → four

Click-moment #2 (Ractor `&mut self`) gets a refinement note about
mailbox-cycle Rubicon (no physical boundary). New click-moment #4 —
"Multi-store consistency / cross-zone messaging looked like the hard
coordination problem → column-substrate identity shows there is no
cross-zone messaging." Concluding paragraph distinguishes the three
workload-shape dissolutions (#1-3) from the substrate-identity
dissolution (#4) which makes the others' "no copy, no marshal, no
coordination" claims literal.

### Salvage section's SupervisableShader framing updated

The earlier "zone-1↔zone-2 boundary" language was already wrong twice
in this PR; final framing under column-substrate identity: these are
column-flip-cycle commitment primitives. Lance's version chain provides
the natural retry semantics. The handler's "supervision boundary" is
the flip-cycle, not a perimeter — because there is no second store.

## Status after this commit

- PR #162 now carries: Phase 2 entry artifacts (canary plan + execution
  prompt + PR-X10 verdict patch P1-1) AND PR #404 rollback salvage AND
  column-substrate-identity reframe
- All four click-moments documented; framing across Zone model,
  Click-moments inventory, and Salvage section is consistent
- PR-X10 A6 absorbs heel_f64x8 distance kernels with bench parity gate
- Re-entry path for SupervisableShader + RestartBackoff named
  (future PR-X14 in lance-graph; first consumer is the NARS-revision
  handler that flips `revised: false → true` per column-flip semantics)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants