diff --git a/.claude/knowledge/databend-ndarray-simd-prompt.md b/.claude/knowledge/databend-ndarray-simd-prompt.md new file mode 100644 index 00000000..dfce5a2b --- /dev/null +++ b/.claude/knowledge/databend-ndarray-simd-prompt.md @@ -0,0 +1,246 @@ +# Databend + ndarray::simd — Claude Code Flex Prompt + +Adopt Databend as the Rust-native ClickHouse successor and inject `ndarray::simd` +into its hot kernel paths. This is the **recommended ClickHouse-tier +migration target** per `stack-consolidation-bardioc-to-hhtl.md` (path C: 0 +transcode cost, weeks not years to OLAP parity). + +Companion to: +- `ndarray-simd-trojan-horse-prompt.md` (path A — FFI into stock ClickHouse, + buys time during cutover) +- `bardioc-weekend-rebuild-prompt.md` (the baseline measurement target) + +Copy the block below into a fresh Claude Code session. Authorize +`--allowed-tools '*'`, Rust 1.94, Docker. + +Budget: 24 hours wall-clock (half the trojan horse — Databend is already +Rust-native, no FFI bridge to build). + +--- + +```text +You are integrating `ndarray::simd` (from adaworldapi/ndarray, AVX-512 default, +`target-cpu=x86-64-v4`) into Databend (datafuselabs/databend, Rust columnar +OLAP on Arrow + DataFusion + Tokio, MIT licensed). The deliverable is a +fork that swaps Databend's SIMD code paths for ndarray::simd primitives, +benchmarks against stock Databend AND stock ClickHouse, and produces a +report comparing all three. + +This is path C from the consolidation: Databend is the recommended +ClickHouse successor for the AdaWorldAPI stack's analytic tier. Bardioc's +ClickHouse decommissions when Databend + ndarray::simd reaches parity on +the OLAP workloads that matter. + +Spawn 8 parallel workers + 1 coordinator. Git worktrees per worker. Branch: +`databend-simd/{role}-{id}`. Integration via docker-compose stand-up of +three OLAP engines side-by-side. + +## Why Databend, not transcode ClickHouse + +Full ClickHouse transcode is 5–10 engineer-years. Databend is: +- Rust-native (no FFI bridge needed) +- Arrow + DataFusion + Tokio (compatible with the wider Rust ecosystem) +- ClickHouse-shape SQL dialect (much of TPC-H ports unchanged) +- MIT licensed (clean integration with AdaWorldAPI codebase) +- Already maintained by a funded team (datafuselabs) +- Smaller hot kernel surface than ClickHouse — fewer kernels to swap + +Trade-off accepted: Databend's storage format is not ClickHouse-wire-compatible. +The migration plan is workload-by-workload re-ingestion from Bardioc Cassandra +into Databend, not in-place storage swap. Acceptable because Bardioc cutover +already involves dual-write phases (see bardioc-weekend-rebuild-prompt.md). + +## Databend SIMD injection targets + +Fork Databend at the current stable tag. Add ndarray as a workspace dep. +Replace target SIMD paths with ndarray::simd calls. Tests stay; benches add. + +Priority order (most-impact kernels first): + +1. **`src/query/expression/src/kernels/filter.rs`** — column filter + `mask & column` and packed-int boolean evaluation → + `ndarray::simd::filter_apply_mask` +2. **`src/query/functions/src/aggregates/aggregate_sum.rs`** + `avg.rs` + + `min_max.rs` → `ndarray::simd::reduce_{sum,min,max,mean}` for all + numeric types (f32, f64, i32, i64, u32, u64) +3. **`src/query/expression/src/kernels/hash.rs`** — hash-table probing for + joins and group-by → `ndarray::simd::hash_xxh3_batch` +4. **`src/query/functions/src/scalars/comparison.rs`** — column-vs-column and + column-vs-literal `< == >` → `ndarray::simd::compare_{lt,eq,gt}` +5. **`src/query/expression/src/kernels/take.rs`** — gather operations for + selection vectors → `ndarray::simd::gather_{f32,f64,u32,u64}` +6. **`src/common/storage/parquet/`** — parquet decode hot path (bitpack + + RLE) → `ndarray::simd::{bitpack_decode,rle_decode}` +7. **`src/query/functions/src/scalars/string/`** — substring / position + functions → `ndarray::simd::substring_find` + +Databend test suite is comprehensive — `cargo test --workspace` must pass +unchanged after each swap. SIMD primitives that don't exist yet in +ndarray::simd: document the gap and skip the kernel (becomes a follow-on +ndarray PR under the W1a consumer contract). + +## Worker split (8 + coordinator) + +| Worker | Target | Role | +|---|---|---| +| W1 | Fork + dep wiring | Fork Databend at stable tag; add ndarray dep; CI setup; bench harness skeleton | +| W2 | Kernel 1 (filter) | Filter / mask kernel swap + parity tests + bench vs stock | +| W3 | Kernel 2 (aggregates) | Sum/avg/min/max for all numeric types + bench | +| W4 | Kernel 3 (hash) | Hash-table probing + group-by + join hash + bench | +| W5 | Kernel 4 (comparison) | Comparison ops + bench | +| W6 | Kernel 5 + 6 (take + parquet) | Gather + parquet decode + bench | +| W7 | Kernel 7 (string) | Substring / position + bench | +| W8 | Three-way bench | docker-compose: stock ClickHouse + stock Databend + ndarray-Databend; identical workload; report generator | + +Coordinator: integration testing, cherry-pick to main branch, docker-compose +orchestration, REPORT.md generation. + +## Benchmark workload + +Run THREE engines against the SAME workload: +- **Stock ClickHouse** (reference performance — the bar to beat or match) +- **Stock Databend** (current Rust-native baseline) +- **ndarray-Databend** (the fork from this prompt) + +Workloads: +1. **TPC-H scale factor 10** — Q1, Q3, Q6, Q14 (these stress the kernels + we swapped: filter, agg, join, group-by). Standard benchmark, comparable + across the industry. +2. **ClickBench** — datafuselabs' adapted ClickHouse benchmark, ~40 queries + on a real web-analytics dataset. Directly designed for ClickHouse-vs-X + comparison. +3. **Cognitive analytics mini-workload** — 100 ad-hoc queries over a + synthetic NARS-revision log (joins, time-bucketing, top-K aggregation). + This represents the actual operational-analytics queries the AdaWorldAPI + stack will run against egressed cognitive state. + +Report per engine: +- p50 / p95 / p99 query latency per query +- Cold-cache vs warm-cache latency +- CPU instructions retired (`perf stat`) +- Peak memory +- Indexing/ingestion throughput + +Output: `./benchmarks/REPORT.md` with three-column comparison tables. + +## Acceptance criteria + +Per kernel swap: +1. Bit-exact parity for integer, ULP-bounded for float +2. Within 5% of stock Databend OR faster +3. Existing Databend test suite passes (`cargo test --workspace`) + +Per engine: +1. All TPC-H + ClickBench queries return correct results on all three + engines (cross-validate ClickHouse ↔ Databend ↔ ndarray-Databend) +2. ndarray-Databend ≥ stock Databend on geomean latency +3. ndarray-Databend within 2× of stock ClickHouse on geomean latency (the + migration story is "Rust-native parity at acceptable cost", not + "beat ClickHouse on every query") + +If ndarray-Databend beats ClickHouse on ANY query: that's a major signal, +call it out in REPORT.md. + +## Anti-goals + +- Do NOT add new ndarray::simd primitives this weekend. If a kernel needs a + missing primitive, document the gap and skip the kernel. The gap becomes + a follow-on ndarray PR. +- Do NOT submit upstream PRs to Databend this weekend. The deliverable is + the validated fork + benchmark report. Upstream contribution is a + separate follow-on after numbers are clean and reviewed. +- Do NOT introduce nightly Rust. Databend builds on stable; keep it that way. +- Do NOT optimize Databend's planner / SQL parser / catalog. The point is + kernel-level SIMD swap, not architecture work. +- Do NOT touch HHTL substrate (PR-X4, PR-X9). This is independent OLAP-tier + work; HHTL is the cognitive-tier work. + +## Time budget (24 hours) + +| Hour 0-2 | W1: fork + dep wiring + bench harness skeleton | +| Hour 2-12 | W2-W7 in parallel: kernel swaps + per-kernel benches | +| Hour 12-18 | W8: three-way docker-compose stack + ClickBench run | +| Hour 18-22 | Cognitive mini-workload + report generation | +| Hour 22-24 | REPORT.md write-up + handoff | + +If a kernel doesn't reach parity in its allotted window, document the gap +and skip. Honest negatives are also data — they tell us which ndarray::simd +primitives need follow-on work. + +## Strategic outcomes (what the report unlocks) + +1. **Migration target validated**: if ndarray-Databend reaches Databend + parity AND is within 2× of ClickHouse on TPC-H + ClickBench, the + consolidation doc's "Databend is the ClickHouse successor" claim is + evidenced rather than asserted. + +2. **Three-engine reference point**: future Databend or ClickHouse PRs can + re-run this exact harness and see whether ndarray::simd injection is + still worth it. Living benchmark, not a one-shot report. + +3. **Cognitive-tier evidence**: the cognitive mini-workload demonstrates + that Databend handles the actual operational-analytics queries the + AdaWorldAPI stack will issue (post-cognitive egress to SQL). If those + queries are sub-second on ndarray-Databend, the analytics tier is + solved without further work. + +4. **ndarray::simd cross-validation**: kernels validated against TWO + engines (Databend benchmarks plus the trojan-horse ClickHouse-via-FFI + benchmarks) is much stronger evidence than either alone. The + intersection set (kernels both engines stress the same way) becomes the + ndarray::simd "battle-tested" subset. + +5. **Decommission timeline**: Bardioc ClickHouse can be decommissioned + per-workload when ndarray-Databend passes the relevant cognitive + mini-workload subset, not all at once. Risk-bounded cutover. + +Begin. Report progress every 4 hours with kernel done / in-progress / +blocked + parity pass-fail + perf delta vs stock Databend AND stock +ClickHouse. +``` + +--- + +## Notes for using this prompt + +- Databend builds clean on Rust 1.94 stable. ~10 min full build, ~30s + incremental. No CMake, no JVM, no FFI bridge — pure Cargo. +- ClickHouse stand-up via official docker image (`clickhouse/clickhouse-server`). +- Databend has an official docker image too (`datafuselabs/databend`). +- ClickBench dataset is ~14GB compressed; provision disk accordingly. +- TPC-H generation via `dbgen`; scale factor 10 produces ~10GB. +- The cognitive mini-workload is the most important — it's the only one + that's actually shaped like AdaWorldAPI's real future queries. + +## Composition with other prompts + +This prompt sits inside the four-prompt strategic arc: + +1. **`bardioc-weekend-rebuild-prompt.md`** — build the OLD stack honest + (migration baseline measurement target) +2. **`stack-consolidation-bardioc-to-hhtl.md`** — the architectural reframe + doc (why the NEW stack wins, four-tier picture) +3. **`ndarray-simd-trojan-horse-prompt.md`** — path A: inject ndarray::simd + INTO the legacy stack (ClickHouse + Tantivy via FFI) — buys time during + cutover, accelerates legacy +4. **`databend-ndarray-simd-prompt.md`** (this) — path C: adopt the + Rust-native CLICKHOUSE-shape successor with ndarray::simd injection — + the actual migration TARGET + +Combined timeline: +- Weekend 1: prompt 1 (Bardioc baseline) +- Weekend 2: this prompt (Databend integration) +- Weekend 3: prompt 3 (trojan horse — optional, buys cutover time) +- Ongoing: HHTL development (PR-X4 + PR-X9), workload-by-workload cutover + +## Follow-on opportunities (NOT this weekend) + +- Upstream PR cadence to Databend: 1 PR per parity-or-better kernel; faster + cycle than ClickHouse because Rust-native (no FFI review burden) +- Polars integration: same ndarray::simd primitives plug into Polars + DataFrame ops; weekend follow-on +- DataFusion integration: arrow-rs has SIMD for filter/take/aggregate; + ndarray::simd could plug in there too, benefiting the entire + DataFusion-derived ecosystem (Databend, GreptimeDB, InfluxDB IOx, Ballista) +- Quickwit integration: combines Tantivy trojan horse + Databend analytics + in one operational stack diff --git a/.claude/knowledge/hhtl-canary-inhabitance-plan.md b/.claude/knowledge/hhtl-canary-inhabitance-plan.md new file mode 100644 index 00000000..fb37a8f2 --- /dev/null +++ b/.claude/knowledge/hhtl-canary-inhabitance-plan.md @@ -0,0 +1,229 @@ +# HHTL Canary Inhabitance Plan + +Date: 2026-05-19 +Status: Phase 2 entry condition — names the canary workload for the 6-sprint substrate arc +Companion docs: +- `stack-consolidation-bardioc-to-hhtl.md` (architectural frame) +- `pr-master-consolidation.md` (6-sprint plan) +- `pr-master-consolidation-savant-verdict.md` (Phase 1 verdict — READY-WITH-DOC-FIXES, all patches applied) +- `hhtl-substrate-execution-prompt.md` (Phase 2 execution flex prompt — sibling to this doc) + +## Why this doc exists + +The strategic arc proves the new architecture wins **on paper**. The 6-sprint +plan moves PR-X4 + PR-X9 from **design to substrate**. Neither artifact answers +the question the substrate has to answer to count as **inhabited**: when does +one specific cognitive query path *run end-to-end on the new architecture using +the new idioms*? + +This doc names the canary. The canary is what closes the gap between +"substrate exists" and "substrate is lived in." + +## The canary: NARS revision routed through HHTL cascade + +**Workload**: a NARS belief revision triggered by a perceptual surface, routed +through the splat4d cascade to the relevant basin, materializing the basin +codebook entry on demand, returning a revised `TruthValue` via the Rubicon +commit gate, persisted to SurrealDB through a typed-surface adapter. + +**Why this workload**: +- It is **architecturally pure** — exercises every load-bearing piece of the + new substrate (cascade, codebook, Rule #3, Rubicon, per-thought bindspace, + typed surfaces, zone-1↔2 boundary, ndarray::simd kernels) +- It is **real** — NARS revision is a primary cognitive workload, not a + synthetic benchmark; the existing Bardioc stack runs it constantly +- It is **measurable** — has a scalar reference implementation in + `src/hpc/nars.rs` to compare against for correctness +- It is **scoped** — one query path, not a system migration; can be + retracted without affecting parallel sprint work +- It is **representative** — the result generalizes: if revision-via-HHTL + works, every other cascade-routed cognitive op works the same way + +## What "routed through HHTL" concretely means + +Each step exercises a specific substrate primitive. This is the inhabitance +checklist — not the implementation order: + +| Step | Substrate piece | Rule / discipline | +|---|---|---| +| 1. Perceptual surface arrives at a Ractor mailbox | Ractor as Rubicon gate (not Erlang) | Per-thought bindspace begins on mailbox entry | +| 2. Surface → `Base17` typed wrapper | ndarray::hpc::cognitive (PR-X9) | Typed surface, not DTO | +| 3. `CascadeAddr::from_position` Hilbert-3D encode | PR-X10 A12 hilbert.rs | Deterministic, no shared state | +| 4. Cascade L1 XOR projection | PR-X4 splat4d cascade | Single XOR + table-addressing, no scan | +| 5. Cascade L2-L4 hops | PR-X4 splat4d cascade | Each hop = 1 XOR; total ≤ 4 hops | +| 6. Basin lookup at leaf address | PR-X9 LazyBlockedGrid | Lazy: codebook present → return; absent → materialize | +| 7. Basin materialization (cold path only) | PR-X12 codec (rANS decode) | Decode under the Rubicon write-back gate, not during cascade | +| 8. NARS revision over (existing truth, new evidence) | hpc::nars existing | Pure function: returns new `TruthValue`, no `&mut self` | +| 9. Rubicon commit | Ractor handler `&mut self` is the legitimate gated write | Single committed outcome per mailbox message | +| 10. Zone-1↔2 boundary crossing | sea-orm at zone 3 (only if egressing); SurrealDB at zone 2 | Typed surface in, ACID-tx out, materialization once | +| 11. Per-thought bindspace dies | Message lifetime | No global registry retained | + +Eleven steps, one query path, four hops, sub-microsecond worst case (claimed). +The canary either reaches that envelope or the architecture is wrong. + +## Measurement gates + +The canary passes Phase 2 when **all** of the following hold on a Zen4 or +Sapphire Rapids 8-core box, AVX-512 enabled (`target-cpu=x86-64-v4`): + +### Correctness gates (binary) + +1. **Revision output matches scalar reference**: + - `Fingerprint` (u64) bit-exact match against `src/hpc/nars.rs::revise` + - `TruthValue` (f, c) within ULP ≤ 4 of scalar reference + - 10,000 randomly-seeded revisions, zero divergences allowed +2. **Cascade routing is deterministic**: + - Same `(Base17, position)` → same `CascadeAddr` across runs + - Same `CascadeAddr` → same basin entry (warm cache or cold-materialized) + - Bit-exact reproducibility across 100 runs +3. **No `&mut self` during compute** (compile-time enforcement): + - `ndarray::hpc::cognitive::*` engines have `revise(&self, ...) -> Result` + - Only Ractor handlers carry `&mut self` and only for commit, never compute + - Clippy lint `clippy::needless_pass_by_ref_mut` clean +4. **Per-thought bindspace is per-thought**: + - No `static`/`lazy_static`/`OnceLock` carrying mutable cognitive state + inside zone 1 — audited by grep + sentinel-qa review +5. **Typed surfaces at zone boundaries**: + - Zone 1 → zone 2: `ndarray::hpc::*` types, no `serde_json::Value`, no + `HashMap>`, no DTO layer + - Zone 2 → zone 3: `sea-orm` ActiveModel, materialization exactly once + +### Performance gates (numeric) + +1. **p99 revision latency** (warm cache, cascade depth ≤ 4): + ≤ **1.5 µs** (target 700 ns mean per the HHTL claim; allow 2× headroom on p99) +2. **p99 revision latency** (cold cache, includes basin materialization): + ≤ **15 µs** (codec decode + cascade + revision; rANS decode dominates) +3. **Cascade-only latency** (excluding revision math): + ≤ **400 ns p99** (4 XOR hops + 4 table addressings) +4. **Codebook hit rate after 1M revisions warmup**: + ≥ **95%** (sparse basins not pre-materialized; popular cells warm fast) +5. **Throughput, saturated**: + ≥ **1M revisions/sec** per core sustained over 10 seconds (~1 µs amortized) +6. **Working set per worker thread**: + ≤ **1 MB** (fits L2 cache on Zen4/SPR) +7. **ndarray::simd primitive coverage**: + 100% of hot-path SIMD ops route through `ndarray::simd::*` — zero raw + intrinsics in the cognitive path (enforced by clippy lint and the W1a + consumer contract gate) + +### Inhabitance gates (qualitative) + +1. **The canary path reads like the architecture document.** A new reader + should be able to trace each of the 11 steps above to a specific function + in the codebase. If the code is more complex than the architecture + description, the architecture didn't get inhabited — a translation + layer got built. +2. **No "Bardioc-shaped" code in the canary path.** No SQL builders for + the lookup, no Elasticsearch-shaped query DSL, no JanusGraph-shaped + traversal, no ClickHouse-shaped aggregation. The cascade is the lookup; + the codebook is the storage; the Rubicon is the commit. If any + step reaches for a legacy idiom, the canary has not inhabited. +3. **The canary survives a sentinel-qa audit** with zero P0 SAFETY findings + on the new code (existing scalar reference is grandfathered). +4. **The integration sprint produces a 30-second screen recording** showing + the canary running end-to-end, p99 latency on screen, codebook hit + rate climbing during warmup. Recording is committed to the repo. + +## What is NOT the canary + +Explicit anti-scope so the canary doesn't drift into a system migration: + +- **Not**: a full Bardioc → HHTL stack swap +- **Not**: a multi-workload benchmark suite +- **Not**: a SQL or graph-query analog of NARS revision +- **Not**: production cutover from Bardioc +- **Not**: a UI demo +- **Not**: a research artifact about HHTL theory — the canary is the + *operational* proof, not a paper + +If the canary works, Bardioc cutover is a follow-on per-workload migration +that can take months. The canary just has to demonstrate inhabitability of +*one* path. + +## Where the canary lives + +| Component | Crate / path | Sprint | +|---|---|---| +| `Base17` + `Fingerprint` + `TruthValue` types | `ndarray::hpc::{nars,fingerprint,base17}` (existing) | — (pre-existing) | +| `Hilbert3D::{encode,decode}` | `ndarray::hpc::linalg::hilbert` | PR-X10 A12 | +| `CascadeAddr` + `from_position` + `XorProjection` | `ndarray::hpc::splat4d::cascade` | PR-X4 | +| `SplatPyramid, BR, BC>` | `ndarray::hpc::splat4d::pyramid` | PR-X4 + PR-X9 (GridStorage is PR-X9) | +| `BasinCodebook` + `LazyBlockedGrid` | `ndarray::hpc::cognitive::{codebook,storage}` | PR-X9 | +| rANS encode/decode + `CellMode` + `rdo_cell` | `ndarray::hpc::codec::*` | PR-X12 | +| Per-pillar PASS gates (revision math certified) | `ndarray::hpc::pillar::*` | PR-X11 | +| OGIT cognitive namespace bridge | `ndarray::hpc::ogit_bridge::*` | PR-X13 | +| Ractor Rubicon gate (`RevisionHandler`) | `lance-graph::cognitive::nars_actor` (new) | Integration sprint | +| SurrealDB egress (zone 2 typed surface) | `lance-graph::cognitive::nars_persist` (new) | Integration sprint | +| End-to-end canary binary | `lance-graph/examples/nars_canary.rs` (new) | Integration sprint | +| Measurement harness | `lance-graph/benches/nars_canary.rs` (new) | Integration sprint | + +The integration sprint produces the two `lance-graph::cognitive::*` modules +that wire the substrate pieces together. The wiring is small (~200 LoC each); +the substrate pieces are the work. + +## Composition with the 4-prompt strategic arc + +| Strategic prompt | Role | Canary relationship | +|---|---|---| +| `bardioc-weekend-rebuild-prompt.md` | Baseline measurement (legacy) | Produces the **NARS-revision-on-Bardioc** number the canary beats | +| `ndarray-simd-trojan-horse-prompt.md` | Path A: ClickHouse + Tantivy FFI inject | **Independent** — analytic tier, not cognitive | +| `databend-ndarray-simd-prompt.md` | Path C: Rust-native ClickHouse successor | **Independent** — analytic tier, not cognitive | +| **THIS DOC + `hhtl-substrate-execution-prompt.md`** | Cognitive tier — the actual architectural win | Canary measures **revision-on-HHTL** vs the Bardioc baseline | + +The four-prompt arc handles the **analytic tier** (where ClickHouse used to +live). This canary handles the **cognitive tier** (where HHTL lives). They +compose: the analytic tier is Bardioc's escape hatch; the cognitive tier is +the architecture's reason to exist. + +Both must work for the consolidation to be real. The cognitive canary is +the harder and more important one. + +## Pass/fail decision + +If the canary passes all gates: HHTL is **inhabited**. Bardioc cognitive-tier +cutover is a per-workload migration; analytic-tier cutover follows path A +(buy time) or path C (replace). The consolidation arc is operationally +proved. + +If the canary fails **performance gates** (latency/throughput): the +architecture's algorithmic regime claim ("two orders of magnitude") is +wrong. Re-examine the cascade depth, the codebook materialization cost, +or the SIMD primitive coverage. Patch and re-measure. + +If the canary fails **correctness gates** (ULP/bit-exact): a substrate bug +exists. P0 — block all dependent sprint work until resolved. + +If the canary fails **inhabitance gates** (qualitative): the substrate +exists but isn't being lived in — the integration sprint built a +translation layer instead of using the substrate primitives. Re-write +the wiring, not the substrate. + +## Sequencing + +The canary cannot be implemented until the 6 substrate sprints land (the +canary depends on PR-X4 + PR-X9 + PR-X10 A12 + PR-X11 + PR-X12 + PR-X13). +**The canary is the integration sprint deliverable**, not a parallel track. + +The 6 sprints run per the master schedule (W1-W8 in +`pr-master-consolidation.md`). Integration sprint = W8 = canary build + +measure + record + write report. + +## What changes if the canary passes + +Three things become true that aren't true today: + +1. **The architecture document stops being a claim and becomes a measurement.** + The "700ns at depth 4" claim is now a number with confidence intervals. +2. **Per-workload Bardioc cutover becomes mechanically composable.** Each + subsequent cognitive workload follows the canary pattern: typed surface + in, cascade lookup, codebook materialization, Rubicon commit, zone + boundary crossing. No new architectural decisions per workload. +3. **The four strategic prompts can be executed with confidence.** Today + they read as "buy time + measure baseline + adopt successor." After + the canary passes, they read as "execute the cutover" with the cognitive + tier already proven. + +If the canary doesn't pass, those three things stay false — and the next +session has to decide whether to debug the substrate or revisit the +architecture. diff --git a/.claude/knowledge/hhtl-substrate-execution-prompt.md b/.claude/knowledge/hhtl-substrate-execution-prompt.md new file mode 100644 index 00000000..66c9c48e --- /dev/null +++ b/.claude/knowledge/hhtl-substrate-execution-prompt.md @@ -0,0 +1,571 @@ +# HHTL Substrate Execution Prompt — Phase 2 Protocol A, 8 Weeks, 6 Sprints + +Master execution prompt for the 8-week / 6-sprint substrate build that takes +PR-X4 + PR-X9 (and their dependencies PR-X10/X11/X12/X13) from **design to +substrate**, culminating in the NARS-revision canary defined in +`hhtl-canary-inhabitance-plan.md`. + +Companion docs (read first): +- `pr-master-consolidation.md` — sprint plan + dependency DAG +- `pr-master-consolidation-savant-verdict.md` — Phase 1 verdict (READY-WITH-DOC-FIXES, all 10 patches applied) +- `pr-x4-design.md`, `pr-x9-design.md`, `pr-x10-linalg-core-design.md`, `pr-x11-jc-consolidation-design.md`, `pr-x12-codec-x265-design.md`, `pr-x13-ogit-bridge-design.md` — per-sprint specs +- `hhtl-canary-inhabitance-plan.md` — the integration deliverable +- `vertical-simd-consumer-contract.md` — SIMD primitives W1a contract +- `.claude/rules/data-flow.md` — Rule #3 + +This prompt is the **copy-paste-into-fresh-session** artifact that spawns +each sprint per Protocol A. It is NOT a single Claude Code session — each +sprint kickoff is its own session (Protocol A semantics make sprints +parallelism-bounded, not session-bounded). + +--- + +## How to use this prompt + +For each sprint window in the W1-W8 schedule, copy the relevant **§ Sprint +kickoff** block below into a fresh Claude Code session. Authorize the listed +tools. The session runs the sprint per Protocol A: preflight → 6 savants → +workers → P0 fix → P2 review → merge. + +Sessions in different windows are independent and can run on different +days. Sessions within the same window (e.g. PR-X11 + PR-X13 in W3) are +independent and can run in parallel. Each sprint produces its own PR off +`claude/pr-x4-splat-cascade-design` (or successor session branches per +session policy). + +--- + +## Phase 2 Protocol A — the cadence each sprint follows + +Every sprint kickoff in the schedule below runs the same 7-step Protocol A: + +1. **Preflight skeleton** — coordinator agent writes commented-out Rust: + all impl blocks `unimplemented!()`, all types stubbed, all doc-comment + data-flow rules in place, no bodies. ~200-400 LoC depending on sprint + surface. Goal: get the API shape on the page before bodies exist. +2. **Parallel-savant fan-out (6 specialists, same skeleton, no collision)**: + - `savant-architect` — layering, target_feature isolation, SoA shape + - `sentinel-qa` — SAFETY claims, `unsafe` block audit + - **data-flow-savant** — Rule #3, builder exemption, &mut/&self split + - **distance-typing-savant** — typed-distance discipline (no `Box`) + - **naming-collision-savant** — symbol clashes with shipped crates + - **test-coverage-savant** — parity/property/integration test plan + Each writes a verdict against the preflight skeleton. Verdicts can be + PASS, BLOCK, or ADVISORY. BLOCK halts the sprint until resolved. +3. **Workers fill bodies** — N workers (per-sprint count below), each + owning one file, parallel where the dependency graph permits. Workers + import the preflight types; they do not edit type signatures unless a + savant explicitly demanded it. +4. **Codex P0 audit on combined diff** — runs against the whole sprint + diff once all workers report green. Codex is invoked via the existing + audit harness; output committed to `.claude/knowledge/pr-x{N}-codex-audit.md`. +5. **Coordinator fixes P0s** — every P0 must be resolved before P2 review. +6. **P2 savant pre-merge review** — joint plan-review savant with full + diff context. Output: SHIP / DO-NOT-SHIP / SHIP-AFTER-X. Committed to + `.claude/knowledge/pr-x{N}-p2-savant-review.md`. +7. **Merge, integration test, signal next sprint** — merge gates: green + `cargo test --workspace --features `, green + `cargo clippy -- -D warnings`, SHIP verdict from P2 savant. + +PR-X3 had a post-merge UB finding (overlapping `&mut [T]`). Protocol A's +data-flow-savant at step 2 would have caught that. The protocol is +load-bearing — do not skip steps. + +--- + +## Sprint sequencing — the 8-week schedule + +| Week | Sprints | Workers | Parallel? | +|---|---|---|---| +| W1-W2 | PR-X10 (linalg-core foundation) | 12 (max fan-out: A1 → A2-A12 parallel) | Internal max-parallel | +| W3 | PR-X11 (jc consolidation) + PR-X13 (OGIT bridge) | 6 + 4 | Yes (independent sprints) | +| W4-W5 | PR-X12 (codec) + PR-X4 (splat cascade) | 8 (effective 4-way per P1-4) + 5 | Yes (independent sprints) | +| W6-W7 | PR-X9 (basin-codebook) | 6 | Single sprint (depends on X12 + X13) | +| W8 | Integration + canary | 3 (canary build, harness, recording) | Single sprint (canary deliverable) | + +Total: 44 sprint workers + 6 coordinators + 6 specialist savants (re-used +across sprints — stateless re-roles) over 8 weeks. + +--- + +## § Sprint kickoff — W1-W2: PR-X10 (linalg-core foundation) + +```text +You are coordinator for PR-X10 (linalg-core), the foundation sprint of the +HHTL substrate arc. 12 max-fan-out workers; 2-week window; produces the +`ndarray::hpc::linalg::*` surface that every downstream sprint consumes. + +READ FIRST: +- `.claude/knowledge/pr-x10-linalg-core-design.md` — the per-worker A1-A12 + decomposition; A12 is MANDATORY Hilbert-3D per joint savant scope-cut +- `.claude/knowledge/pr-master-consolidation.md` — sprint plan + DAG +- `.claude/knowledge/pr-master-consolidation-savant-verdict.md` — P0/P1 + applied state; invariant 12 governs (master ruling: path (b)) +- `.claude/knowledge/vertical-simd-consumer-contract.md` — SIMD W1a gate +- `.claude/rules/data-flow.md` — Rule #3 + +WORKER DECOMPOSITION (12 max-fan-out): +- A1 (sequential) — `linalg/mod.rs` + `MatN` foundation +- A2 (parallel) — `linalg/quat.rs` (Quat algebra) +- A3 (parallel) — `linalg/spd.rs` (Spd2/Spd3/SpdN, sandwich ops) +- A4 (parallel) — `linalg/eig.rs` (eig_sym_3 closed-form + Jacobi general-N) +- A5 (parallel) — `linalg/svd.rs` (Golub-Reinsch + one-sided Jacobi) +- A6 (parallel) — `linalg/polar.rs` (polar decomposition) +- A7 (parallel) — `linalg/mat_exp.rs` (matrix exponential, Padé) +- A8 (parallel) — `linalg/sh.rs` (spherical harmonics deg 0..=7) +- A9 (parallel) — `linalg/conv.rs` (Conv1d/2d/3d typed wrappers) +- A10 (parallel) — `linalg/attention.rs` (naive + flash, both ship) +- A11 (parallel) — `linalg/norm.rs` + `activations_ext.rs` + `rope.rs` +- A12 (parallel, MANDATORY) — `linalg/hilbert.rs` (Butz/Skilling 3D Hilbert + encode/decode, ~200 LoC; consumed by PR-X4 splat4d::cascade::CascadeAddr) +- Tier 3 OPTIONAL (rng/vml ext/fft ext/sparse/banded) — ship only if Tier + 1+2 finish in window; defer otherwise + +PROTOCOL A — execute the 7 steps in `hhtl-substrate-execution-prompt.md`. +The 6 specialist savants for the preflight review are listed there. + +ACCEPTANCE GATES: +- All A1-A12 mandatory items merged with green tests, green clippy, green + codex P0 audit, SHIP verdict from P2 savant +- `cargo test --workspace --features linalg` passes +- W1a consumer contract honored for every new public SIMD-touching fn +- Type aliases preserve splat3d::Spd3 for backward compat (invariant: full + type aliases ruling) +- Closed-form + general-N coexist per invariant 12 + +PR FORMAT: open one PR per worker (A1..A12), all targeting a single +integration branch `pr-x10/linalg-core`. Coordinator merges the +integration branch as one PR to master after Protocol A step 7. + +BUDGET: 2 weeks. If A1 slips, all 12 workers slip — coordinator's first +job is unblocking A1 within 48 hours. + +NEXT SPRINTS: W3 spawns PR-X11 + PR-X13 in parallel once PR-X10 merges. +``` + +--- + +## § Sprint kickoff — W3: PR-X11 (jc consolidation) + PR-X13 (OGIT bridge) + +These two sprints run **in parallel**; spawn one session each. They share +no files and have no inter-sprint dependencies. + +### PR-X11 (jc consolidation, 6 workers, 1 week) + +```text +You are coordinator for PR-X11 (jc consolidation). 6 workers; 1-week +window; moves jc's Spd2/Spd3/Wasserstein/signature/cov_high_d math into +`ndarray::hpc::pillar::*` per invariant 12. + +READ FIRST: +- `.claude/knowledge/pr-x11-jc-consolidation-design.md` (Pillar-8 with + placeholder σ_temporal per joint savant P1-2) +- `.claude/knowledge/pr-master-consolidation.md` +- `.claude/knowledge/pr-master-consolidation-savant-verdict.md` +- The relevant `lance-graph/crates/jc/src/*.rs` files that move + +WORKER DECOMPOSITION (6 workers): +- B1 — `pillar/mod.rs` + Pillar-6 (Spd2 ewa_sandwich_2d, from jc) +- B2 — Pillar-7 (Spd3 ewa_sandwich_3d + koestenberger, from jc) +- B3 — Pillar-10 (Pflug Wasserstein-1, from jc/src/pflug.rs) +- B4 — Pillar-8 (temporal_sandwich, NEW; placeholder σ_temporal + + `TODO(calibrate-pillar-8-σ_temporal)` per P1-2) +- B5 — Pillar-9 (Cov16384 / cov_high_d, Düker-Zoubouloglou CLT) +- B6 — Pillar-11 (Hambly-Lyons signature transform) + +PROTOCOL A — 7 steps. + +ACCEPTANCE GATES: +- All 6 pillars implemented + probe runners shipped +- Probe PASS gates: PSD rate ≥ 0.999, log-norm concentration verifiable +- `#[deprecated]` markers added to `lance-graph/crates/jc/src/{ewa_sandwich, + ewa_sandwich_3d,koestenberger,pflug}.rs` with 1-cycle transition note +- `ndarray::hpc::pillar::*` is the canonical home; jc becomes a thin + probe-runner that imports pillar +- Pillar-8 ships with documented-arbitrary placeholder σ_temporal + + tracking issue link + +PARALLELISM: B1-B6 run in parallel after Protocol A step 1 (preflight) +lands — none of them depend on each other. Hard fan-out = 6. + +BUDGET: 1 week. The user's "12 agenten" cadence is the ceiling; this +sprint hits 6 effective because pillars are file-scoped independent. +``` + +### PR-X13 (OGIT bridge, 4 workers, 1 week) + +```text +You are coordinator for PR-X13 (OGIT embedded TTL bundle). 4 workers; +1-week window; replaces the lance-graph-ontology hop with embedded TTL +files via `include_str!` per joint savant P0-3. + +READ FIRST: +- `.claude/knowledge/pr-x13-ogit-bridge-design.md` (include_str! confirmed + per P0-3) +- The 26 OGIT TTL files (mirror PR-Z1's spec) + +WORKER DECOMPOSITION (4 workers): +- D1 — `ogit_bridge/mod.rs` + the trait surface +- D2 — `ogit_bridge/cognitive.rs` (per-namespace bridge for cognitive) +- D3 — `ogit_bridge/parser.rs` (Turtle parser over `include_str!` strings) +- D4 — `assets/cognitive/*.ttl` + `embedded.rs` (the 26 TTL files + + include_str! wiring; ~50 LoC + 900 lines TTL) + +PROTOCOL A — 7 steps. + +ACCEPTANCE GATES: +- `include_str!` validated UTF-8 at compile time (P0-3 ruling) +- No `include_bytes!` references anywhere in the bridge code +- TTL files baked into the binary (~150 KB compressed) +- Bridge exposes `cognitive_ttls()` returning `&'static [(name, str)]` +- Zero-startup-cost lookup (no runtime parsing for the embedded path) +- `ndarray::hpc::ogit_bridge::*` is the canonical home; lance-graph-ontology + bridge pattern deprecated + +PARALLELISM: D1 sequential (mod.rs foundation), then D2/D3/D4 parallel. + +BUDGET: 1 week. +``` + +--- + +## § Sprint kickoff — W4-W5: PR-X12 (codec) + PR-X4 (splat cascade) + +These two sprints run **in parallel**; spawn one session each. They share +no files but PR-X9 (W6-W7) depends on both. + +### PR-X12 (codec, 8 workers / 4-way effective parallel, 2 weeks) + +```text +You are coordinator for PR-X12 (x265-style codec for cognitive basin +compression). 8 workers; 4-way effective parallel per joint savant P1-4; +2-week window. + +READ FIRST: +- `.claude/knowledge/pr-x12-codec-x265-design.md` — RansEncoder docstring + per P0-1; tinyvec::ArrayVec<[CtuPartition; 85]> per P0-2; A2-A5 parallel + then A6-A7 parallel then A8 sequential per P1-4 +- `.claude/knowledge/pr-master-consolidation-savant-verdict.md` + +WORKER DECOMPOSITION (8 workers, max effective 4-way): +- A1 (sequential) — `codec/ctu.rs` (Ctu carrier + CtuPartition + quad-tree) +- A2 (parallel after A1) — `codec/mode.rs` (CellMode enum, 4 modes per + P1-4 ruling: skip/merge/delta/escape) +- A3 (parallel) — `codec/predict.rs` (per-mode prediction) +- A4 (parallel) — `codec/transform.rs` (DCT-like spatial xform on cell + deltas) +- A5 (parallel) — `codec/quantize.rs` (quantization with `RdoConfig`) +- A6 (parallel after A2-A5) — `codec/rdo.rs` (λ-RDO loop + `rdo_cell`) +- A7 (parallel after A2-A5) — `codec/rans.rs` (rANS encoder; encode_symbol + has the builder-exemption docstring per P0-1) +- A8 (sequential after A7) — `codec/stream.rs` (pack/unpack stream format) + +PROTOCOL A — 7 steps. + +ACCEPTANCE GATES: +- `RansEncoder::encode_symbol(&mut self)` carries Rule #3 builder + exemption docstring (P0-1) +- `CtuPartition` quad-tree uses stack-arena pattern (tinyvec::ArrayVec or + pre-allocated Vec indexed by u16); no `Box` heap allocs + on the RDO loop hot path (P0-2) +- 4 codec modes (skip/merge/delta/escape); 5th mode (basin-shift) + collapsed into escape +- rANS chosen over CABAC (cognitive symbol skew justifies) +- `cargo test --workspace --features codec` passes +- Compression ratio ≥ 5:1 on synthetic basin codebook fixtures + +PARALLELISM: per P1-4 ruling, **4-way max** (A2-A5), not 6-way. A1 → +[A2,A3,A4,A5] → [A6,A7] → A8. + +BUDGET: 2 weeks. +``` + +### PR-X4 (splat cascade, 5 workers, 1 week) + +```text +You are coordinator for PR-X4 (splat4d temporal cascade onto BlockedGrid). +5 workers; 1-week window. Interim worktree path `src/hpc/splat3d/v2/` per +P1-3; public module path `crate::hpc::splat4d::*` from day one via +mod.rs re-export. + +READ FIRST: +- `.claude/knowledge/pr-x4-design.md` — module path clarification per P1-3 +- `.claude/knowledge/pr-x10-linalg-core-design.md` — A12 Hilbert-3D + consumed by `CascadeAddr::from_position` +- `.claude/knowledge/pr-master-consolidation-savant-verdict.md` + +WORKER DECOMPOSITION (5 workers): +- C1 (sequential) — `splat4d/mod.rs` + `CascadeAddr` type (4 bytes, cache- + aligned, parent/children via shift-mask) +- C2 (parallel after C1) — `splat4d/cascade.rs` (L1-L4 cascade hops; XOR + projection; consumes `linalg::hilbert::Hilbert3D::encode` from PR-X10) +- C3 (parallel) — `splat4d/pyramid.rs` (SplatPyramid, + BR, BC>; storage is generic over PR-X9's GridStorage trait, defaults + to BlockedGrid for v1) +- C4 (parallel) — `splat4d/temporal_sandwich.rs` (Pillar-8 consumer + + temporal drift sandwich) +- C5 (parallel) — `splat4d/raster.rs` (cascade-aware rasterization; + backward-compat shim wrapping splat3d::tile.rs) + +PROTOCOL A — 7 steps. + +ACCEPTANCE GATES: +- `crate::hpc::splat4d::*` reachable from day one via mod.rs re-export + (P1-3) +- CascadeAddr is 4 bytes, deterministic XOR cascade +- L1-L4 hop traversal in <400ns p99 (cache-resident path) — see + `hhtl-canary-inhabitance-plan.md` performance gate 3 +- splat3d::tile.rs becomes a shim, deprecated 1-cycle +- SplatPyramid storage-polymorphic over GridStorage (PR-X9 trait) +- `cargo test --workspace --features splat4d` passes + +PARALLELISM: C1 sequential, then C2-C5 parallel. 4-way effective. + +BUDGET: 1 week. + +NEXT SPRINT: W6-W7 spawns PR-X9 once both PR-X12 and PR-X4 merge. +``` + +--- + +## § Sprint kickoff — W6-W7: PR-X9 (basin-codebook, 6 workers, 1.5 weeks) + +```text +You are coordinator for PR-X9 (lazy basin-codebook with LazyBlockedGrid). +6 workers; 1.5-week window. Depends on PR-X12 (codec primitives) and +PR-X13 (OGIT bridge). Per P0-4, PR-X9 A5 uses PR-X12's codec primitives +verbatim (no codec re-implementation in this sprint). + +READ FIRST: +- `.claude/knowledge/pr-x9-design.md` — GridStorage trait with + `T: Copy, const BR, const BC` type params per P1-5; A5 narrowed scope + per P0-4 +- `.claude/knowledge/pr-x12-codec-x265-design.md` — the codec surface + PR-X9 A5 consumes +- `.claude/knowledge/pr-x13-ogit-bridge-design.md` — the OGIT cognitive + namespace PR-X9 attaches basins to + +WORKER DECOMPOSITION (6 workers): +- E1 (sequential) — `cognitive/storage.rs` (GridStorage trait + impl for BlockedGrid per P1-5 stable-1.94 fix) +- E2 (parallel after E1) — `cognitive/lazy_grid.rs` (LazyBlockedGrid: present-cells in BlockedGrid, absent-cells materialized on demand + under Rubicon write-back gate) +- E3 (parallel) — `cognitive/codebook.rs` (BasinCodebook: per-cell rANS- + encoded payload + decode-on-access cache; bounded LRU) +- E4 (parallel) — `cognitive/revise.rs` (NARS revision lifted to + GridStorage; consumes `ndarray::hpc::pillar::Pillar-7` for + certification) +- E5 (parallel) — `cognitive/encode.rs` (encode_from_dense using + `ndarray::hpc::codec::{CellMode, MergeDir, rdo_cell, RdoConfig}` per + P0-4 — no codec re-impl) +- E6 (parallel) — `cognitive/parity.rs` (BlockedGrid ↔ LazyBlockedGrid + cell-by-cell parity test harness; integration target) + +PROTOCOL A — 7 steps. + +ACCEPTANCE GATES: +- GridStorage trait compiles on stable Rust 1.94 (no generic const + expressions) per P1-5 +- LazyBlockedGrid implements GridStorage with on-demand + materialization under Rubicon write-back gate (single-target gated XOR + semantics per data-flow.md Rule #3) +- Codec surface imported from PR-X12, not re-implemented (P0-4) +- BlockedGrid ↔ LazyBlockedGrid parity: per-cell L1 distance ≤ + `epsilon_floor` for any RdoConfig +- `cargo test --workspace --features cognitive` passes +- Codebook hit rate target ≥ 95% on warmed-up workload (canary gate + performance #4) + +PARALLELISM: E1 sequential (GridStorage foundation), then E2-E6 parallel. +5-way effective. + +BUDGET: 1.5 weeks. + +NEXT SPRINT: W8 integration + canary. +``` + +--- + +## § Sprint kickoff — W8: Integration + Canary (3 workers, 1 week) + +```text +You are coordinator for the integration sprint. 3 workers; 1-week window; +delivers the **NARS-revision canary** defined in +`hhtl-canary-inhabitance-plan.md`. + +This sprint is where the substrate stops being parts and becomes a system. + +READ FIRST: +- `.claude/knowledge/hhtl-canary-inhabitance-plan.md` — THE canary spec + (workload, 11 substrate steps, correctness gates, performance gates, + inhabitance gates) +- `.claude/knowledge/stack-consolidation-bardioc-to-hhtl.md` — Rubicon + model, zone boundaries, three-legged stool +- `.claude/knowledge/bardioc-weekend-rebuild-prompt.md` — the baseline + the canary measures against + +WORKER DECOMPOSITION (3 workers): +- F1 — `lance-graph/cognitive/nars_actor.rs` (~200 LoC) + - Ractor actor with mailbox = `PerceptualSurface` + - Handler = Rubicon crossing: cascade route → basin lookup → + materialize-on-cold → NARS revise → write-back via gated XOR + - Per-thought bindspace owned by the message lifetime + - `&mut self` ONLY in the handler, and only for the gated commit +- F2 — `lance-graph/cognitive/nars_persist.rs` (~200 LoC) + - Zone-1→zone-2 boundary: typed surface in (NarsBeliefRevision), + SurrealDB ACID-tx out + - Typed surface defined in ndarray::hpc::*; no DTO layer + - Zone-2→zone-3 (sea-orm SQL egress) optional and behind a feature flag +- F3 — `lance-graph/examples/nars_canary.rs` + `lance-graph/benches/nars_canary.rs` + - End-to-end binary: ingest 1M synthetic perceptual surfaces, route + through HHTL cascade, revise, commit, measure + - Bench harness: p50/p95/p99 latency (warm + cold), throughput, + codebook hit rate + - 30-second screen recording committed to repo + +PROTOCOL A — 7 steps (lightweight — small surface, 3 workers). + +CANARY ACCEPTANCE GATES (from hhtl-canary-inhabitance-plan.md): + +Correctness (binary, all must pass): +1. Revision output bit-exact (Fingerprint) and ULP ≤ 4 (TruthValue) vs + `src/hpc/nars.rs::revise` scalar reference, 10,000 seeded revisions, + zero divergences +2. Cascade routing deterministic across 100 runs +3. No `&mut self` in compute paths (clippy + sentinel-qa audit) +4. No static/lazy_static carrying mutable cognitive state in zone 1 +5. Typed surfaces at zone boundaries (no serde_json::Value, no DTOs) + +Performance (numeric, all must pass on Zen4 or SPR 8-core, AVX-512): +1. p99 revision latency warm: ≤ 1.5 µs +2. p99 revision latency cold: ≤ 15 µs +3. Cascade-only latency: ≤ 400 ns p99 +4. Codebook hit rate after 1M warmup: ≥ 95% +5. Throughput saturated: ≥ 1M revisions/sec per core sustained 10s +6. Working set per worker: ≤ 1 MB +7. ndarray::simd primitive coverage: 100% of hot-path SIMD + +Inhabitance (qualitative): +1. Canary code reads like the architecture document — 11 substrate steps + traceable to 11 specific function calls +2. No Bardioc-shaped code in canary path (no SQL builders, no ES DSL, + no JanusGraph traversals, no ClickHouse aggregations) +3. Sentinel-qa P0 SAFETY findings on new code: zero +4. 30-second screen recording committed (canary running end-to-end, p99 + on screen, hit rate climbing during warmup) + +DELIVERABLE: `.claude/knowledge/pr-x4-x9-canary-results.md` — measured +numbers per gate; SHIP / RE-MEASURE / RE-ARCHITECT decision; comparison +against the Bardioc baseline from bardioc-weekend-rebuild-prompt.md (if +the baseline has been run); next-steps recommendations. + +BUDGET: 1 week. If a gate fails, document the failure in the results +doc, then decide: +- Performance fail → re-examine cascade depth, codebook materialization + cost, or SIMD primitive coverage; patch and re-measure +- Correctness fail → P0; block dependent sprint work until resolved +- Inhabitance fail → re-write the wiring (F1/F2), not the substrate + +CANARY OUTCOME: +PASS → HHTL is operationally proved; per-workload Bardioc cutover becomes + mechanically composable; analytic-tier paths (A: trojan horse, C: + Databend) can be executed with confidence. +FAIL → HHTL claim is not yet validated; the next session decides whether + to debug the substrate or revisit the architecture. +``` + +--- + +## Cross-sprint operational notes + +### Specialist savant rotation + +The 6 specialist savants are **stateless re-roles**, not per-sprint +incarnations. The same `data-flow-savant` reviews PR-X10 preflight, then +PR-X11 preflight, then PR-X12 preflight, etc. Reduces savant context-switch +overhead per joint savant decision 6 ruling. + +### Codex P0 audits + +Run codex on the combined sprint diff at step 4, not per-worker. Output +goes to `.claude/knowledge/pr-x{N}-codex-audit.md`. Coordinators must +resolve every P0 before P2 review at step 6. + +### Branch hygiene + +Each sprint uses an integration branch (`pr-x{N}/integration`); per-worker +PRs target the integration branch, coordinator merges integration to +master after Protocol A step 7. Avoids 12 simultaneous PRs to master. + +### Deprecation cycle + +PR-X11 marks jc files `#[deprecated(since="0.X", note="moved to +ndarray::hpc::pillar")]` for one cycle. Removal in cycle N+2. PR-X13 +supersedes lance-graph-ontology bridge pattern with the same cadence. + +### Feature gate matrix (additive) + +```toml +# Default +default = ["std", "linalg"] + +# Per-sprint +splat3d = ["dep:..."] +splat4d = ["splat3d", "linalg"] +blocked_grid = ["std"] +linalg = ["std"] +pillar = ["linalg"] +codec = ["std", "blocked_grid"] +ogit_bridge = ["std"] +cognitive = ["blocked_grid", "linalg", "codec", "ogit_bridge"] + +# Aggregates +cognitive_full = ["cognitive", "splat4d", "pillar"] +``` + +Default builds stay small; canary opts in to `cognitive_full`. + +### Backward compat for splat3d consumers + +`pub use crate::hpc::linalg::Spd3 as Spd3;` etc — Rust monomorphizes +across type aliases (same type, not new type). Existing splat3d +consumers compile unchanged after PR-X10 lands. + +--- + +## What this prompt does NOT do + +- It does not run the 4-prompt analytic-tier arc (Bardioc baseline, + trojan horse, Databend). Those are independent and can run in parallel + with the substrate arc. The canary measures against the Bardioc + baseline if it has been run; absent that, the canary measures absolute + numbers. +- It does not migrate Bardioc workloads. The canary proves + *inhabitability* of one workload; per-workload migration is a follow-on + multi-month effort. +- It does not address HHTL theory or paper-writing. The canary is the + operational proof; theory artifacts are downstream. +- It does not contain code. It contains the kickoff prompts for each + sprint session; code is written inside those sessions. + +--- + +## Done criteria (substrate arc, 8 weeks) + +The substrate arc is "done" when: + +- All 6 sprints land per the W1-W8 schedule (44 sprint workers + 6 + coordinators + 6 specialist savants) +- `ndarray::hpc::*` 10-submodule layout is the canonical structure +- jc deprecated 1 cycle; lance-graph-ontology bridge pattern superseded +- The NARS-revision canary passes all 3 gate classes (correctness + + performance + inhabitance) +- 30-second screen recording committed showing canary running end-to-end +- `.claude/knowledge/pr-x4-x9-canary-results.md` written with measured + numbers and SHIP / RE-MEASURE / RE-ARCHITECT decision + +If all six criteria hit on schedule: HHTL is inhabited. Bardioc cognitive- +tier cutover is now a mechanical per-workload migration; the analytic +tier follows path A or path C per the four-prompt arc. The architecture +that started as a strategic document is now an operational substrate. diff --git a/.claude/knowledge/pr-x10-linalg-core-design.md b/.claude/knowledge/pr-x10-linalg-core-design.md index 02098e78..c5d5cd2d 100644 --- a/.claude/knowledge/pr-x10-linalg-core-design.md +++ b/.claude/knowledge/pr-x10-linalg-core-design.md @@ -42,6 +42,7 @@ src/hpc/linalg/ ├── svd.rs — Golub-Reinsch + one-sided Jacobi SVD ├── polar.rs — A = U·P decomposition (built on SVD) ├── matfn.rs — mat_exp + mat_log (Padé + scaling-and-squaring) +├── distance.rs — L1 / L2 / L∞ over f64x8 lanes (absorbed from PR #160 heel_f64x8) ├── quat.rs — Quat carrier + algebra (mul, conjugate, slerp, from_axis_angle, to_mat) ├── sh.rs — extended SH (deg 0..=7) — supersedes splat3d/sh.rs deg-3 only ├── conv.rs — Conv1D + Conv2D (im2col + gemm path, direct path for small kernels) @@ -169,6 +170,26 @@ Higham's scaling-and-squaring Padé(13/13) for general matrices (3 × ε_machine **Precision class: EXACT** for SPD path (via `eig_sym` + scalar `vml::exp_f32`/`vml::ln_f32`); **VERIFY** for general path (Padé approximant order vs scaling depth trade-off). +### Distance kernels — `linalg::distance` + +```rust +pub fn l1_f64_simd(a: &[f64], b: &[f64]) -> f64 { ... } // Σ |a_i − b_i| +pub fn l2_f64_simd(a: &[f64], b: &[f64]) -> f64 { ... } // √Σ (a_i − b_i)² +pub fn linf_f64_simd(a: &[f64], b: &[f64]) -> f64 { ... } // max |a_i − b_i| +``` + +Lane-parallel over `F64x8` with horizontal reduce at the tail. Absorbs the +`heel_f64x8::l1/l2/linf` kernels from PR #160 (lance-graph) — the code is +correct, the framing was wrong (it was filed as "Sprint 0a of a four-repo +integration arc"; the right home is here, alongside polar / matfn in the +linalg core). Bench parity vs the PR #160 implementation is part of the A6 +acceptance gate, not a separate worker. + +**Precision class: EXACT** for L1 and L∞ (no rounding beyond the underlying +subtract + abs). **VERIFY** for L2 (the final `sqrt` is one ULP; the sum is +order-of-summation dependent — A6 uses pairwise reduce for determinism, same +shape as `blas_level1::nrm2`). + ### Higher-degree SH — `linalg::sh` Supersedes `splat3d::sh.rs` (which ships deg-3 only). Adds deg-4 through deg-7: @@ -459,7 +480,7 @@ This is a LARGE sprint. Per the user's "12 agents + 1 coordinator" cadence: | 6 | **A3 — Matrix inverse (3×3, 4×4, general)** | 1 | `linalg/inverse.rs` | ~300 | | 7 | **A4 — Symmetric eig (Jacobi + QR)** | 1 | `linalg/eig_sym.rs` | ~450 | | 8 | **A5 — SVD (Golub-Reinsch + one-sided Jacobi)** | 1 | `linalg/svd.rs` | ~500 | -| 9 | **A6 — Polar + mat_exp + mat_log** | 1 | `linalg/polar.rs`, `linalg/matfn.rs` | ~400 | +| 9 | **A6 — Polar + mat_exp + mat_log + distance** | 1 | `linalg/polar.rs`, `linalg/matfn.rs`, `linalg/distance.rs` (absorbs PR #160 `heel_f64x8::l1/l2/linf`) | ~500 | | 10 | **A7 — SH deg 0..=7** | 1 | `linalg/sh.rs` (supersedes `splat3d/sh.rs`) | ~400 | | 11 | **A8 — Conv1D + Conv2D** | 1 | `linalg/conv.rs` | ~450 | | 12 | **A9 — Batched gemm + Norms + Activations** | 1 | `linalg/batched.rs`, `linalg/norm.rs`, `linalg/activations_ext.rs` | ~550 | @@ -516,7 +537,7 @@ Plus parity gates: 3. **f64 path?**: splat3d is f32-only. Inference modules are f32. Pillar probes use f64 internally for concentration math. Does `linalg-core` ship f32 AND f64? Lean: **f32 primary** (matches the rest of `hpc::*`), add `_f64` variants only on demand. Savant: rule on whether to pre-ship f64 for the Pillar consumers. -4. **`jc` consolidation path (a) vs (b)**: keep jc zero-dep on ndarray (path a) or relax for SPD only (path b)? Architectural call. Lean: **(a)** preserves the self-certifying property. Coordinator: confirm with jc-architect before committing. +4. **`jc` consolidation path (a) vs (b)**: ~~keep jc zero-dep (path a) or relax for SPD only (path b)?~~ **RESOLVED by joint savant P1-1 + invariant 12 (master ruling): path (b) — jc's math consolidates into `ndarray::hpc::pillar::*`. PR-X10 does not decide this; it ships the canonical ndarray-side surface that PR-X11 then consumes.** See §"PR-X11 consumption pattern" L390 above. 5. **Flash-attention as v1 or v2?**: flash-attention is ~3× the implementation complexity of naive attention. v1 ships naive only; v2 adds flash. OR v1 ships both. Lean: **v1 ships both** — the inference modules need flash for any sequence longer than ~512 tokens. Cost: ~250 extra LoC on A10. diff --git a/.claude/knowledge/stack-consolidation-bardioc-to-hhtl.md b/.claude/knowledge/stack-consolidation-bardioc-to-hhtl.md index 245de223..2c9f9362 100644 --- a/.claude/knowledge/stack-consolidation-bardioc-to-hhtl.md +++ b/.claude/knowledge/stack-consolidation-bardioc-to-hhtl.md @@ -90,16 +90,134 @@ aggregate-scan queries. So the ClickHouse strength is irrelevant, not absent. ## Zone model -| Zone | Layer | Role | Boundary contract | +| Zone | Column-state phase | Surface that watches | What "being in this zone" means | |---|---|---|---| -| **Zone 1** (hot / in-process) | lance-graph + ndarray + Ractor | cognitive shader stack, Rubicon gates, HHTL cascade | typed surfaces, no serde, Rule #3 territory | -| **Zone 2** (warm / persistence) | SurrealDB (+ Tantivy FTS) | cognitive system's own state — committed outcomes only | typed surfaces in, ACID-tx out | -| **Zone 3** (cold / egress) | sea-orm | legacy SQL bridge (PostgreSQL, MySQL, host org's DB) | DTOs / SQL rows — materialization happens here | - -**Ractor lives at zone boundaries**, never inside the zone-1 cascade. Actors are -the gates between deliberation and persistence (1↔2) and between persistence -and legacy egress (2↔3). Inside zone 1, the cascade is pure function composition -over typed surfaces. +| **Zone 1** (hot) | `committed = false`, currently held in mailbox-cycle scope | lance-graph cascade ops | the row is being deliberated; cascade compute is in-flight against the same bytes a future Zone-2 reader will see | +| **Zone 2** (warm) | `committed = true`, Lance-versioned | SurrealDB LIVE subscriptions, lance-graph reads | the row's truth-value crossed the Rubicon; any LIVE watcher with a matching predicate observes the flip as a column-state transition | +| **Zone 3** (cold) | `egressed_at IS NOT NULL`, mirrored once | sea-orm to legacy RDBMS | the row has been materialised into PG-shape for the legacy surface; the source Lance bytes are unchanged | + +**Zones are temporal phases of column state on a single Lance dataset, not +storage tiers.** Same physical bytes throughout. A row does not "move" from +zone 1 to zone 2; a column flips from `committed = false` to `true`, and +the LIVE watchers notice. There is no serialise / marshal / wire-format +step between strata because there are no strata — there is one Lance +dataset, multiple state-flag columns, and multiple dialect surfaces reading +the same buffers. + +This is the right framing for the Rubicon model: the crossing is a *column +flip*, not a write event. There is no "mailbox in RAM commits to +SurrealDB" — SurrealDB always saw the row, the row just changed state. The +mailbox-cycle still governs the commit (the handler decides when to flip +the flag, and `&mut self` there is the gated write), but the flip itself +is a state transition on bytes that didn't move. + +What stays true from earlier framings: +- The cascade inside a single handler body is pure function composition over + typed surfaces (Rule #3 territory) +- The `&mut self` in the handler IS the gated write — legitimate because it + IS the Rubicon crossing (the column flip), not "during computation" +- Typed surfaces at the dialect interfaces (SurrealQL parses to column + predicates; sea-orm projects to legacy DTOs; Databend pushes filters to + column kernels) — but these are *type-level* contracts on how each + dialect reads the same bytes, not perimeters around different stores + +See § "Column-substrate identity" below for the full unification. + +## Column-substrate identity — Lance ≡ Arrow ≡ ndarray SoA + +``` +Lance dataset (single physical store) + │ + ▼ +Lance column ≡ Arrow column buffer ≡ ndarray SoA + (one representation, all the way down) + │ + ├──→ lance-graph: XOR-cascade lookups, cognitive-shader cycles + │ (ndarray SIMD ops directly over the column bytes; + │ no copy, no serde, no marshal — the "in-RAM Thought" + │ IS the Lance column slot) + │ + ├──→ SurrealDB: SurrealQL parses → reads the same column + │ LIVE subscription = a watch on column-state predicates + │ + ├──→ sea-orm: SQL via Lance backend → reads the same column + │ (Zone-3 egress is materialise-once into PG-shape for the + │ legacy surface; the source bytes are unchanged) + │ + ├──→ Databend: analytic SQL → reads the same column + │ (ndarray::simd kernel swap → operates on the same bytes + │ the cognitive cascade just operated on) + │ + └──→ Tantivy: FTS index → built over the same column +``` + +**One physical representation, end to end.** The Lance column layout, the +Arrow column buffer layout, and the ndarray SoA layout are the same bytes +viewed through three names. The four dialect surfaces (lance-graph cascade, +SurrealDB, sea-orm, Databend, Tantivy) all parse their respective query +languages down to operations on those same bytes. + +**ndarray amortises the SIMD primitive across the whole stack.** The same +kernel that runs the cognitive cascade, that Databend's filter pushdown +invokes, that Tantivy's indexer reads, that sea-orm projects to legacy +egress — they are the same kernel on the same bytes. ndarray pays for the +SIMD primitive once and the entire stack collects rent. No transcode tier, +no copy boundary, no format conversion at any zone. + +**Rubicon = column-state flip, not write event.** A Thought is a Lance row +from the moment it is allocated to the moment it is queried by any surface. +"Crossing the Rubicon" means flipping (e.g.) `committed: false → true` — +versioned natively by Lance, observed by any LIVE watcher with a matching +predicate, no serialisation involved. + +### What this dissolves + +| Earlier framing (wrong) | Why it's wrong | +|---|---| +| "Mailbox writes to SurrealDB on Rubicon crossing" | There is no write — SurrealDB always saw the row; the row just changed state | +| "MvccProvider::snapshot_ts threads across engines" | There is one Lance dataset with one version chain; all readers see the same version | +| "surrealdb-ractor as cf-event router" | No cf-event-as-message needed; mailboxes already share the same column slice that SurrealDB watches | +| "sea-orm-ractor entity-actor dispatch by PK" | The mailbox IS the row; no separate dispatch layer | +| "Zone 1 in-process vs Zone 2 durable" (as storage tiers) | Same physical bytes; zones are temporal phases of column state, not storage tiers | +| "TiKV as routing / coordination layer" | TiKV ranges are Lance dataset shards under the XOR cascade — substrate, not routing | +| "kv-lance translates records into Lance rows for SurrealDB" | No translation; SurrealQL parses directly against Lance columns that lance-graph already owns | + +### What survives — JITson / Cranelift, cleaner than before + +The compile-time → JIT pipeline does not collapse with the framing — it +sharpens: + +- **ndarray SoA layout = Lance column layout = known at OGIT-schema-compile time.** + The schema fixes the column shape; everything downstream specialises against it. +- **`DeriveEntityModel` (or equivalent) emits column-typed accessors at Rust + compile time** — typed handles into the same bytes for each dialect surface. +- **Cranelift JITs hot-path kernels specialised for the OGIT-derived column + types at first call** — predicate compilation, projection compilation, + cascade-step compilation, all against the typed column shape. +- **"Sinkin becomes compile next time"** — when a new column shape enters the + substrate (ontology evolution), the next compile cycle regenerates the typed + accessors and the JIT re-specialises against the new shape. +- **All four dialect surfaces automatically inherit the new kernels** because + they all operate on the same column layout. Add a column → all surfaces + see it. Specialise a kernel → all surfaces use it. + +### Implication for the four-tier picture + +The four-tier picture earlier in this doc names `ndarray::simd` as "the +common SIMD substrate across all four tiers". That claim is correct, but +its load-bearing reason is the column-substrate identity, not "we happen +to use the same SIMD library in four places". The deeper fact: + +> **The column IS the SoA IS the ndarray buffer.** The cognitive cascade, +> the analytic scan, the FTS index build, and the graph traversal all +> operate on the same bytes through the same SIMD kernels. ndarray::simd +> is the common substrate because the substrate is genuinely one thing, +> not four parallel things wearing the same uniform. + +This is the actually-clean Foundry-aspiring shape: one physical store, one +column layout, one kernel set, multiple dialect surfaces. The "same data, +different syntax" claim is finally literal — not "same schema across +translation layers" but **same bytes, period.** ## Rule #3 ⊕ Rubicon ⊕ Per-thought bindspace (three-legged stool) @@ -204,7 +322,7 @@ depending on workload count. | HHTL distribution math is wrong | High | This is the load-bearing claim; numerical certification (PR-X11 pillars) covers cascade ops; add formal proofs for the XOR-projection bijectivity property before zone-2 commit | | 90° vector / Walsh-Hadamard basis breaks for non-projectable queries | High | API enforces "queries must be expressible in basis"; queries that aren't are bounced back to the caller with a typed error, not silently scanned | -## Click-moments inventory (the three architectural dissolutions) +## Click-moments inventory (the four architectural dissolutions) These are the moments where a perceived problem turned out to not be a problem: @@ -215,17 +333,32 @@ These are the moments where a perceived problem turned out to not be a problem: 2. **Ractor `&mut self` violated Rule #3** → **Rubicon model shows actors are commitment gates, not shared-state mutators.** The handler body IS the Rubicon crossing; `&mut self` there is the gated write, not "during - computation". Dual to Rule #3, not opposed. + computation". Dual to Rule #3, not opposed. **Refinement** (2026-05-19, + post-PR #404 rollback): the mailbox carries the commitment responsibility + implicitly, so there is no physical boundary between zones 1/2/3 for + actors to "live at" — Rubicon is per-mailbox-commit-cycle, distributed + everywhere there is a handler. 3. **ClickHouse OLAP gap blocked the new stack** → **HHTL shows the cognitive workload doesn't need OLAP, just project-and-lookup.** ClickHouse stays in Bardioc and is decommissioned when the last scan-aggregate query is ported (which is never, because cognitive queries don't have that shape). -All three dissolutions are structural — they don't require new code, they +4. **Multi-store consistency / cross-zone messaging looked like the hard + coordination problem** → **Column-substrate identity shows there is no + cross-zone messaging.** Lance column ≡ Arrow buffer ≡ ndarray SoA, same + bytes for every dialect surface (lance-graph, SurrealDB, sea-orm, Databend, + Tantivy). Rubicon is a column-state flip, not a write event. SurrealDB + LIVE subscriptions watch column predicates on bytes they were already + reading. The hard problem dissolves because there were never multiple + stores to keep consistent. See § "Column-substrate identity" above. + +All four dissolutions are structural — they don't require new code, they require seeing the existing architecture through the correct frame. That's why they "click hard": the answer was already in the design; it just needed the -right name. +right name. Dissolutions 1–3 are workload-shape dissolutions; #4 is the +substrate-identity dissolution and is the deepest of the four — it makes +the other three's "no copy, no marshal, no coordination" claims literal. ## What's NOT covered by this consolidation @@ -243,6 +376,158 @@ Honesty roster — things that genuinely don't fit and need separate stories: the typed-surface contract from zone 1 means schema changes ripple through Rust types. Migration discipline TBD. +## Four-tier picture: HHTL only has to win the cognitive layer + +The synthesis that compresses the whole consolidation arc to one diagram. +Bardioc had four CPU-heavy specialty layers. **Three of them already have +Rust-native successors that aren't HHTL.** HHTL only has to win at the +cognitive layer it was designed for. + +| Tier | Workload shape | Bardioc layer | Rust-native successor | Acceleration | +|---|---|---|---|---| +| **Cognitive** (hot path, sub-µs) | project-and-lookup, cascade routing | (no Bardioc analog — application code) | **HHTL** = TiKV + SurrealDB + Ractor + ndarray + lance-graph | ndarray::simd (native), HHTL projection (2 orders of magnitude vs scan) | +| **Analytic** (cold path, ms) | scan-and-aggregate, OLAP, ad-hoc SQL | ClickHouse | **Databend** (Arrow + DataFusion + Tokio, MIT, ClickHouse-shape) | ndarray::simd injection (filter / aggregate / hash kernels) | +| **Search** (full-text) | inverted index, BM25, ngram, faceting | Elasticsearch / Lucene | **Tantivy** (under SurrealDB FTS, also via Quickwit) | ndarray::simd injection (bitpack decode / BM25 / skip-list intersection) | +| **Graph** (traversal) | BFS/DFS, edge-label filter, frontier expansion | JanusGraph | **lance-graph native** (typed surfaces over TiKV ranges) | ndarray::simd (frontier bitsets), no JNI | + +This is the load-bearing reframe for the migration argument: + +- **HHTL is genuinely new IP** — nothing exists like it; the cognitive layer + is where the architecture earns its keep. +- **The other three are inheritances** — Databend / Tantivy / lance-graph are + pre-existing Rust-native engines that already do what their Bardioc + counterparts did, just in Rust with Tokio and Arrow. +- **ndarray::simd is the common SIMD substrate across all four tiers** — + injection target for Databend + Tantivy (the trojan-horse prompts); + native for HHTL + lance-graph (the hot-path cognitive substrate). + +Migration scope shrinks proportionally. The total work is: +1. Build HHTL (PR-X4 + PR-X9, the genuinely new piece). +2. Adopt Databend, Tantivy, lance-graph (existing, just integrate). +3. Inject ndarray::simd into Databend + Tantivy (trojan horse prompts, + 1–2 engineer-weeks per target). +4. Cutover from Bardioc one workload at a time (read-only mirror → dual + write → primary-flip → decommission, the existing migration plan). + +**No transcode of ClickHouse, ES, or JanusGraph is required, ever.** + +## Why we don't transcode ClickHouse (cheap escape hatches) + +A full ClickHouse transcode is one of the hardest software undertakings in +modern infrastructure: ~1.2M LOC C++ core, ~150 vendored libraries, ~1000 +hand-tuned aggregation/scalar functions, decades of SIMD/cache/JIT +optimization. Realistic cost: **5–10 engineer-years**. Reference points: +TiKV's Rust rewrite took ~5 years with the original team; Servo's +C++→Rust port took ~10 years and ended partial; the Postgres→CockroachDB +conceptual port is still incomplete after a decade. + +Three cheaper escape hatches, in order of cost: + +| Approach | Cost | Outcome | +|---|---|---| +| **A. FFI inject ndarray::simd into ClickHouse** (trojan horse prompt) | 1–2 engineer-weeks | ClickHouse stays C++, hot kernels are Rust; legacy stack faster, Bardioc cutover urgency reduced | +| **B. Transcode only the vectorized executor** (~50–100k LOC) | 1–2 engineer-years | Hybrid C++ shell + Rust executor core; deep IP investment, narrow scope | +| **C. Adopt Databend + ndarray::simd injection** (databend prompt) | 0 transcode | Rust-native, ClickHouse-shape, MIT licensed, already maintained, rides upstream | + +**Recommended: C.** Databend already covers ClickHouse-shape workloads in +Rust on Arrow + DataFusion + Tokio. ndarray::simd injection earns the +"hand-tuned" performance parity. Combined cost is engineer-weeks, not +engineer-years, with zero transcode debt. + +A is also valuable in parallel — it accelerates Bardioc during the cutover +window and creates upstream contribution opportunities. B is rarely worth +it; only justified if you need ClickHouse-storage-format wire-compatibility +in a Rust-native engine, which the cognitive stack does not. + +The C# ecosystem analog (asked separately): RavenDB is the closest +single-binary-vendor-everything analog to ClickHouse in .NET, with +EventStoreDB second. Neither is performance-competitive with ClickHouse on +OLAP scan, but they share the operational philosophy. Notable because the +ClickHouse design pattern (full vendoring + native compilation + +obsessive SIMD/cache tuning + willingness to patch upstream) is rare — +ClickHouse may be the only OSS database that does all four. Yandex +heritage is what made it possible. + +## Salvage from the 2026-05-19 cross-repo rollback (PR #404 / PR #160) + +The four-repo demo PR #404 in lance-graph (and its companion ndarray PR #160) +was reverted via PR #405 on 2026-05-19 — the architectural intent is +preserved as a next-cycle target, the code attempt was withdrawn. Two pieces +of that work are NOT dead and have their re-entry points named here so the +next-cycle implementation doesn't lose them: + +### 1. `heel_f64x8::{l1, l2, linf}_f64_simd` → PR-X10 A6 `linalg::distance` + +The distance kernels themselves are correct; the framing was wrong (filed as +"Sprint 0a of a four-repo integration arc" with cross-repo coupling that +made the rollback unavoidable). The same code re-emerges as +`ndarray::hpc::linalg::distance::{l1, l2, linf}_f64_simd` under worker A6 +in PR-X10 — the polar.rs / matfn.rs neighbourhood. Bench parity vs the +PR #160 implementation is part of A6's acceptance gate. See +`pr-x10-linalg-core-design.md` § "Distance kernels — `linalg::distance`". + +### 2. `lance-graph-contract::{ir, provider, actor}` → mostly redundant, except… + +The IR / provider types (`Operator`, `Cardinality`, `EngineHint`, +`MvccProvider`) duplicate work the HHTL arc covers natively — they don't +re-emerge. They're correctly dead. + +**Exception: `SupervisableShader` + `RestartBackoff`** have a future as +*mailbox-cycle commitment-gate primitives* on Ractor actors. **Important +framing refinement** (2026-05-19, post-rollback session): with the Rubicon +model, the mailbox itself carries the commitment responsibility, so the +gate fires *per-message-commit-cycle*, not at a physical zone-1↔zone-2 +boundary. The earlier framing ("they only fire at zone 1↔2 transitions") +was a category error — zones 1/2/3 are *logical* stratification of where +state physically lives, not perimeter walls actors cross. There is no +physical boundary because the mailbox IS the Rubicon. + +What this means concretely for the next-cycle implementation, **under the +column-substrate-identity framing** (see § "Column-substrate identity" +above): + +- `SupervisableShader` is the supervisor-aware wrapper around a Ractor + handler that owns a *column-flip cycle* (read column → compute → flip + state-flag → reply / drop). Its "supervision boundary" is the + flip-cycle, not a perimeter between stores — because there is no second + store. SurrealDB / sea-orm / Databend / Tantivy are dialect surfaces on + the same Lance column the handler is operating on. +- `RestartBackoff` governs how the supervisor responds when a flip-cycle + panics or returns an error before the flag is set. It gates *retry + attempts on the same column flip*, not retries across physical + infrastructure. The Lance version chain provides the natural retry + semantics (the flip either landed-and-committed in the version chain or + it didn't; SurrealDB LIVE watchers only see committed flips). +- Both primitives are stateless types that live in `lance-graph` (the + thinking layer); they don't belong in ndarray (the hardware layer). +- Re-entry point: a future PR-X14 or sibling sprint in the lance-graph + repo that introduces `LanceActor`-shaped wrappers for the canary's + commit path (see `hhtl-canary-inhabitance-plan.md` step 9 — the natural + first consumer is the NARS-revision handler that flips a `revised: false → + true` column on the belief row). + +The "no physical boundary 1/2/3 — and no second store either" insight is +captured as the **fourth click-moment** in the Click-moments inventory +above. Click-moments 1–3 were workload-shape dissolutions; #4 is the +substrate-identity dissolution and is the deepest of the four. The +SupervisableShader + RestartBackoff primitives can be small (~50 LoC each) +because they encode column-flip-cycle semantics, not cross-store +plumbing. + +### Lesson for future cross-repo arcs + +PR #404's failure mode was not bad code — it was a four-repo coupling +filed as a single arc, which made the rollback inherently cross-repo and +the merge-window inherently fragile. The architecturally-equivalent work +re-enters as multiple single-repo PRs across the Phase 2 schedule +(PR-X10 absorbs the distance kernels; a future PR-X14 absorbs the +column-flip-cycle primitives). The next-cycle architectural target — the +four-repo integration demo — happens *after* the canary lands in W8, +not before. Integration depends on substrate, not vice versa. And under +the column-substrate-identity framing, "integration" mostly means "wire +the dialect surfaces to read the columns the canary writes" — there is +no marshal layer to build. + ## References - `pr-master-consolidation.md` — sprint plan, 10-submodule layout @@ -253,6 +538,10 @@ Honesty roster — things that genuinely don't fit and need separate stories: - `pr-x11-jc-consolidation-design.md` — numerical certification (cascade ops) - `pr-x12-codec-x265-design.md` — compressed leaf storage - `pr-x13-ogit-bridge-design.md` — OGIT TTL bundle (ontology grounding) -- `bardioc-weekend-rebuild-prompt.md` — migration baseline prompt +- `bardioc-weekend-rebuild-prompt.md` — migration baseline prompt (build the old stack honest) +- `ndarray-simd-trojan-horse-prompt.md` — inject ndarray::simd into ClickHouse + Tantivy (path A) +- `databend-ndarray-simd-prompt.md` — adopt Databend + ndarray::simd as ClickHouse successor (path C, recommended) +- `hhtl-canary-inhabitance-plan.md` — Phase 2 entry condition: names the NARS-revision canary + correctness/performance/inhabitance gates +- `hhtl-substrate-execution-prompt.md` — Phase 2 Protocol A execution prompt (8 weeks, 6 sprints, 44 workers; per-sprint kickoff blocks for W1-W8) - `.claude/rules/data-flow.md` — Rule #3 source -- lance-graph PR #404 — four-repo demo (architectural target) +- lance-graph PR #404 — four-repo demo (architectural target; merge reverted via PR #405 in 2026-05-19 cross-repo rollback — intent preserved as next-cycle target, code attempt withdrawn)