diff --git a/.claude/knowledge/databend-ndarray-simd-prompt.md b/.claude/knowledge/databend-ndarray-simd-prompt.md
new file mode 100644
index 00000000..dfce5a2b
--- /dev/null
+++ b/.claude/knowledge/databend-ndarray-simd-prompt.md
@@ -0,0 +1,246 @@
+# Databend + ndarray::simd — Claude Code Flex Prompt
+
+Adopt Databend as the Rust-native ClickHouse successor and inject `ndarray::simd`
+into its hot kernel paths. This is the **recommended ClickHouse-tier
+migration target** per `stack-consolidation-bardioc-to-hhtl.md` (path C: 0
+transcode cost, weeks not years to OLAP parity).
+
+Companion to:
+- `ndarray-simd-trojan-horse-prompt.md` (path A — FFI into stock ClickHouse,
+  buys time during cutover)
+- `bardioc-weekend-rebuild-prompt.md` (the baseline measurement target)
+
+Copy the block below into a fresh Claude Code session. Authorize
+`--allowed-tools '*'`, Rust 1.94, Docker.
+
+Budget: 24 hours wall-clock (half the trojan horse — Databend is already
+Rust-native, no FFI bridge to build).
+
+---
+
+```text
+You are integrating `ndarray::simd` (from adaworldapi/ndarray, AVX-512 default,
+`target-cpu=x86-64-v4`) into Databend (datafuselabs/databend, Rust columnar
+OLAP on Arrow + DataFusion + Tokio, MIT licensed). The deliverable is a
+fork that swaps Databend's SIMD code paths for ndarray::simd primitives,
+benchmarks against stock Databend AND stock ClickHouse, and produces a
+report comparing all three.
+
+This is path C from the consolidation: Databend is the recommended
+ClickHouse successor for the AdaWorldAPI stack's analytic tier. Bardioc's
+ClickHouse decommissions when Databend + ndarray::simd reaches parity on
+the OLAP workloads that matter.
+
+Spawn 8 parallel workers + 1 coordinator. Git worktrees per worker. Branch:
+`databend-simd/{role}-{id}`. Integration via docker-compose stand-up of
+three OLAP engines side-by-side.
+
+## Why Databend, not transcode ClickHouse
+
+Full ClickHouse transcode is 5–10 engineer-years. Databend is:
+- Rust-native (no FFI bridge needed)
+- Arrow + DataFusion + Tokio (compatible with the wider Rust ecosystem)
+- ClickHouse-shape SQL dialect (much of TPC-H ports unchanged)
+- MIT licensed (clean integration with AdaWorldAPI codebase)
+- Already maintained by a funded team (datafuselabs)
+- Smaller hot kernel surface than ClickHouse — fewer kernels to swap
+
+Trade-off accepted: Databend's storage format is not ClickHouse-wire-compatible.
+The migration plan is workload-by-workload re-ingestion from Bardioc Cassandra
+into Databend, not in-place storage swap. Acceptable because Bardioc cutover
+already involves dual-write phases (see bardioc-weekend-rebuild-prompt.md).
+
+## Databend SIMD injection targets
+
+Fork Databend at the current stable tag. Add ndarray as a workspace dep.
+Replace target SIMD paths with ndarray::simd calls. Tests stay; benches add.
+
+Priority order (most-impact kernels first):
+
+1. **`src/query/expression/src/kernels/filter.rs`** — column filter
+   `mask & column` and packed-int boolean evaluation →
+   `ndarray::simd::filter_apply_mask`
+2. **`src/query/functions/src/aggregates/aggregate_sum.rs`** + `avg.rs` +
+   `min_max.rs` → `ndarray::simd::reduce_{sum,min,max,mean}` for all
+   numeric types (f32, f64, i32, i64, u32, u64)
+3. **`src/query/expression/src/kernels/hash.rs`** — hash-table probing for
+   joins and group-by → `ndarray::simd::hash_xxh3_batch`
+4. **`src/query/functions/src/scalars/comparison.rs`** — column-vs-column and
+   column-vs-literal `< == >` → `ndarray::simd::compare_{lt,eq,gt}`
+5. **`src/query/expression/src/kernels/take.rs`** — gather operations for
+   selection vectors → `ndarray::simd::gather_{f32,f64,u32,u64}`
+6. **`src/common/storage/parquet/`** — parquet decode hot path (bitpack +
+   RLE) → `ndarray::simd::{bitpack_decode,rle_decode}`
+7. **`src/query/functions/src/scalars/string/`** — substring / position
+   functions → `ndarray::simd::substring_find`
+
+Databend test suite is comprehensive — `cargo test --workspace` must pass
+unchanged after each swap. SIMD primitives that don't exist yet in
+ndarray::simd: document the gap and skip the kernel (becomes a follow-on
+ndarray PR under the W1a consumer contract).
+
+## Worker split (8 + coordinator)
+
+| Worker | Target | Role |
+|---|---|---|
+| W1 | Fork + dep wiring | Fork Databend at stable tag; add ndarray dep; CI setup; bench harness skeleton |
+| W2 | Kernel 1 (filter) | Filter / mask kernel swap + parity tests + bench vs stock |
+| W3 | Kernel 2 (aggregates) | Sum/avg/min/max for all numeric types + bench |
+| W4 | Kernel 3 (hash) | Hash-table probing + group-by + join hash + bench |
+| W5 | Kernel 4 (comparison) | Comparison ops + bench |
+| W6 | Kernel 5 + 6 (take + parquet) | Gather + parquet decode + bench |
+| W7 | Kernel 7 (string) | Substring / position + bench |
+| W8 | Three-way bench | docker-compose: stock ClickHouse + stock Databend + ndarray-Databend; identical workload; report generator |
+
+Coordinator: integration testing, cherry-pick to main branch, docker-compose
+orchestration, REPORT.md generation.
+
+## Benchmark workload
+
+Run THREE engines against the SAME workload:
+- **Stock ClickHouse** (reference performance — the bar to beat or match)
+- **Stock Databend** (current Rust-native baseline)
+- **ndarray-Databend** (the fork from this prompt)
+
+Workloads:
+1. **TPC-H scale factor 10** — Q1, Q3, Q6, Q14 (these stress the kernels
+   we swapped: filter, agg, join, group-by). Standard benchmark, comparable
+   across the industry.
+2. **ClickBench** — datafuselabs' adapted ClickHouse benchmark, ~40 queries
+   on a real web-analytics dataset. Directly designed for ClickHouse-vs-X
+   comparison.
+3. **Cognitive analytics mini-workload** — 100 ad-hoc queries over a
+   synthetic NARS-revision log (joins, time-bucketing, top-K aggregation).
+   This represents the actual operational-analytics queries the AdaWorldAPI
+   stack will run against egressed cognitive state.
+
+Report per engine:
+- p50 / p95 / p99 query latency per query
+- Cold-cache vs warm-cache latency
+- CPU instructions retired (`perf stat`)
+- Peak memory
+- Indexing/ingestion throughput
+
+Output: `./benchmarks/REPORT.md` with three-column comparison tables.
+
+## Acceptance criteria
+
+Per kernel swap:
+1. Bit-exact parity for integer, ULP-bounded for float
+2. Within 5% of stock Databend OR faster
+3. Existing Databend test suite passes (`cargo test --workspace`)
+
+Per engine:
+1. All TPC-H + ClickBench queries return correct results on all three
+   engines (cross-validate ClickHouse ↔ Databend ↔ ndarray-Databend)
+2. ndarray-Databend ≥ stock Databend on geomean latency
+3. ndarray-Databend within 2× of stock ClickHouse on geomean latency (the
+   migration story is "Rust-native parity at acceptable cost", not
+   "beat ClickHouse on every query")
+
+If ndarray-Databend beats ClickHouse on ANY query: that's a major signal,
+call it out in REPORT.md.
+
+## Anti-goals
+
+- Do NOT add new ndarray::simd primitives this weekend. If a kernel needs a
+  missing primitive, document the gap and skip the kernel. The gap becomes
+  a follow-on ndarray PR.
+- Do NOT submit upstream PRs to Databend this weekend. The deliverable is
+  the validated fork + benchmark report. Upstream contribution is a
+  separate follow-on after numbers are clean and reviewed.
+- Do NOT introduce nightly Rust. Databend builds on stable; keep it that way.
+- Do NOT optimize Databend's planner / SQL parser / catalog. The point is
+  kernel-level SIMD swap, not architecture work.
+- Do NOT touch HHTL substrate (PR-X4, PR-X9). This is independent OLAP-tier
+  work; HHTL is the cognitive-tier work.
+
+## Time budget (24 hours)
+
+| Hour 0-2 | W1: fork + dep wiring + bench harness skeleton |
+| Hour 2-12 | W2-W7 in parallel: kernel swaps + per-kernel benches |
+| Hour 12-18 | W8: three-way docker-compose stack + ClickBench run |
+| Hour 18-22 | Cognitive mini-workload + report generation |
+| Hour 22-24 | REPORT.md write-up + handoff |
+
+If a kernel doesn't reach parity in its allotted window, document the gap
+and skip. Honest negatives are also data — they tell us which ndarray::simd
+primitives need follow-on work.
+
+## Strategic outcomes (what the report unlocks)
+
+1. **Migration target validated**: if ndarray-Databend reaches Databend
+   parity AND is within 2× of ClickHouse on TPC-H + ClickBench, the
+   consolidation doc's "Databend is the ClickHouse successor" claim is
+   evidenced rather than asserted.
+
+2. **Three-engine reference point**: future Databend or ClickHouse PRs can
+   re-run this exact harness and see whether ndarray::simd injection is
+   still worth it. Living benchmark, not a one-shot report.
+
+3. **Cognitive-tier evidence**: the cognitive mini-workload demonstrates
+   that Databend handles the actual operational-analytics queries the
+   AdaWorldAPI stack will issue (post-cognitive egress to SQL). If those
+   queries are sub-second on ndarray-Databend, the analytics tier is
+   solved without further work.
+
+4. **ndarray::simd cross-validation**: kernels validated against TWO
+   engines (Databend benchmarks plus the trojan-horse ClickHouse-via-FFI
+   benchmarks) is much stronger evidence than either alone. The
+   intersection set (kernels both engines stress the same way) becomes the
+   ndarray::simd "battle-tested" subset.
+
+5. **Decommission timeline**: Bardioc ClickHouse can be decommissioned
+   per-workload when ndarray-Databend passes the relevant cognitive
+   mini-workload subset, not all at once. Risk-bounded cutover.
+
+Begin. Report progress every 4 hours with kernel done / in-progress /
+blocked + parity pass-fail + perf delta vs stock Databend AND stock
+ClickHouse.
+```
+
+---
+
+## Notes for using this prompt
+
+- Databend builds clean on Rust 1.94 stable. ~10 min full build, ~30s
+  incremental. No CMake, no JVM, no FFI bridge — pure Cargo.
+- ClickHouse stand-up via official docker image (`clickhouse/clickhouse-server`).
+- Databend has an official docker image too (`datafuselabs/databend`).
+- ClickBench dataset is ~14GB compressed; provision disk accordingly.
+- TPC-H generation via `dbgen`; scale factor 10 produces ~10GB.
+- The cognitive mini-workload is the most important — it's the only one
+  that's actually shaped like AdaWorldAPI's real future queries.
+
+## Composition with other prompts
+
+This prompt sits inside the four-prompt strategic arc:
+
+1. **`bardioc-weekend-rebuild-prompt.md`** — build the OLD stack honest
+   (migration baseline measurement target)
+2. **`stack-consolidation-bardioc-to-hhtl.md`** — the architectural reframe
+   doc (why the NEW stack wins, four-tier picture)
+3. **`ndarray-simd-trojan-horse-prompt.md`** — path A: inject ndarray::simd
+   INTO the legacy stack (ClickHouse + Tantivy via FFI) — buys time during
+   cutover, accelerates legacy
+4. **`databend-ndarray-simd-prompt.md`** (this) — path C: adopt the
+   Rust-native CLICKHOUSE-shape successor with ndarray::simd injection —
+   the actual migration TARGET
+
+Combined timeline:
+- Weekend 1: prompt 1 (Bardioc baseline)
+- Weekend 2: this prompt (Databend integration)
+- Weekend 3: prompt 3 (trojan horse — optional, buys cutover time)
+- Ongoing: HHTL development (PR-X4 + PR-X9), workload-by-workload cutover
+
+## Follow-on opportunities (NOT this weekend)
+
+- Upstream PR cadence to Databend: 1 PR per parity-or-better kernel; faster
+  cycle than ClickHouse because Rust-native (no FFI review burden)
+- Polars integration: same ndarray::simd primitives plug into Polars
+  DataFrame ops; weekend follow-on
+- DataFusion integration: arrow-rs has SIMD for filter/take/aggregate;
+  ndarray::simd could plug in there too, benefiting the entire
+  DataFusion-derived ecosystem (Databend, GreptimeDB, InfluxDB IOx, Ballista)
+- Quickwit integration: combines Tantivy trojan horse + Databend analytics
+  in one operational stack
diff --git a/.claude/knowledge/hhtl-canary-inhabitance-plan.md b/.claude/knowledge/hhtl-canary-inhabitance-plan.md
new file mode 100644
index 00000000..fb37a8f2
--- /dev/null
+++ b/.claude/knowledge/hhtl-canary-inhabitance-plan.md
@@ -0,0 +1,229 @@
+# HHTL Canary Inhabitance Plan
+
+Date: 2026-05-19
+Status: Phase 2 entry condition — names the canary workload for the 6-sprint substrate arc
+Companion docs:
+- `stack-consolidation-bardioc-to-hhtl.md` (architectural frame)
+- `pr-master-consolidation.md` (6-sprint plan)
+- `pr-master-consolidation-savant-verdict.md` (Phase 1 verdict — READY-WITH-DOC-FIXES, all patches applied)
+- `hhtl-substrate-execution-prompt.md` (Phase 2 execution flex prompt — sibling to this doc)
+
+## Why this doc exists
+
+The strategic arc proves the new architecture wins **on paper**. The 6-sprint
+plan moves PR-X4 + PR-X9 from **design to substrate**. Neither artifact answers
+the question the substrate has to answer to count as **inhabited**: when does
+one specific cognitive query path *run end-to-end on the new architecture using
+the new idioms*?
+
+This doc names the canary. The canary is what closes the gap between
+"substrate exists" and "substrate is lived in."
+
+## The canary: NARS revision routed through HHTL cascade
+
+**Workload**: a NARS belief revision triggered by a perceptual surface, routed
+through the splat4d cascade to the relevant basin, materializing the basin
+codebook entry on demand, returning a revised `TruthValue` via the Rubicon
+commit gate, persisted to SurrealDB through a typed-surface adapter.
+
+**Why this workload**:
+- It is **architecturally pure** — exercises every load-bearing piece of the
+  new substrate (cascade, codebook, Rule #3, Rubicon, per-thought bindspace,
+  typed surfaces, zone-1↔2 boundary, ndarray::simd kernels)
+- It is **real** — NARS revision is a primary cognitive workload, not a
+  synthetic benchmark; the existing Bardioc stack runs it constantly
+- It is **measurable** — has a scalar reference implementation in
+  `src/hpc/nars.rs` to compare against for correctness
+- It is **scoped** — one query path, not a system migration; can be
+  retracted without affecting parallel sprint work
+- It is **representative** — the result generalizes: if revision-via-HHTL
+  works, every other cascade-routed cognitive op works the same way
+
+## What "routed through HHTL" concretely means
+
+Each step exercises a specific substrate primitive. This is the inhabitance
+checklist — not the implementation order:
+
+| Step | Substrate piece | Rule / discipline |
+|---|---|---|
+| 1. Perceptual surface arrives at a Ractor mailbox | Ractor as Rubicon gate (not Erlang) | Per-thought bindspace begins on mailbox entry |
+| 2. Surface → `Base17` typed wrapper | ndarray::hpc::cognitive (PR-X9) | Typed surface, not DTO |
+| 3. `CascadeAddr::from_position` Hilbert-3D encode | PR-X10 A12 hilbert.rs | Deterministic, no shared state |
+| 4. Cascade L1 XOR projection | PR-X4 splat4d cascade | Single XOR + table-addressing, no scan |
+| 5. Cascade L2-L4 hops | PR-X4 splat4d cascade | Each hop = 1 XOR; total ≤ 4 hops |
+| 6. Basin lookup at leaf address | PR-X9 LazyBlockedGrid | Lazy: codebook present → return; absent → materialize |
+| 7. Basin materialization (cold path only) | PR-X12 codec (rANS decode) | Decode under the Rubicon write-back gate, not during cascade |
+| 8. NARS revision over (existing truth, new evidence) | hpc::nars existing | Pure function: returns new `TruthValue`, no `&mut self` |
+| 9. Rubicon commit | Ractor handler `&mut self` is the legitimate gated write | Single committed outcome per mailbox message |
+| 10. Zone-1↔2 boundary crossing | sea-orm at zone 3 (only if egressing); SurrealDB at zone 2 | Typed surface in, ACID-tx out, materialization once |
+| 11. Per-thought bindspace dies | Message lifetime | No global registry retained |
+
+Eleven steps, one query path, four hops, sub-microsecond worst case (claimed).
+The canary either reaches that envelope or the architecture is wrong.
+
+## Measurement gates
+
+The canary passes Phase 2 when **all** of the following hold on a Zen4 or
+Sapphire Rapids 8-core box, AVX-512 enabled (`target-cpu=x86-64-v4`):
+
+### Correctness gates (binary)
+
+1. **Revision output matches scalar reference**:
+   - `Fingerprint` (u64) bit-exact match against `src/hpc/nars.rs::revise`
+   - `TruthValue` (f, c) within ULP ≤ 4 of scalar reference
+   - 10,000 randomly-seeded revisions, zero divergences allowed
+2. **Cascade routing is deterministic**:
+   - Same `(Base17, position)` → same `CascadeAddr` across runs
+   - Same `CascadeAddr` → same basin entry (warm cache or cold-materialized)
+   - Bit-exact reproducibility across 100 runs
+3. **No `&mut self` during compute** (compile-time enforcement):
+   - `ndarray::hpc::cognitive::*` engines have `revise(&self, ...) -> Result`
+   - Only Ractor handlers carry `&mut self` and only for commit, never compute
+   - Clippy lint `clippy::needless_pass_by_ref_mut` clean
+4. **Per-thought bindspace is per-thought**:
+   - No `static`/`lazy_static`/`OnceLock` carrying mutable cognitive state
+     inside zone 1 — audited by grep + sentinel-qa review
+5. **Typed surfaces at zone boundaries**:
+   - Zone 1 → zone 2: `ndarray::hpc::*` types, no `serde_json::Value`, no
+     `HashMap<String, Box<dyn Any>>`, no DTO layer
+   - Zone 2 → zone 3: `sea-orm` ActiveModel, materialization exactly once
+
+### Performance gates (numeric)
+
+1. **p99 revision latency** (warm cache, cascade depth ≤ 4):
+   ≤ **1.5 µs** (target 700 ns mean per the HHTL claim; allow 2× headroom on p99)
+2. **p99 revision latency** (cold cache, includes basin materialization):
+   ≤ **15 µs** (codec decode + cascade + revision; rANS decode dominates)
+3. **Cascade-only latency** (excluding revision math):
+   ≤ **400 ns p99** (4 XOR hops + 4 table addressings)
+4. **Codebook hit rate after 1M revisions warmup**:
+   ≥ **95%** (sparse basins not pre-materialized; popular cells warm fast)
+5. **Throughput, saturated**:
+   ≥ **1M revisions/sec** per core sustained over 10 seconds (~1 µs amortized)
+6. **Working set per worker thread**:
+   ≤ **1 MB** (fits L2 cache on Zen4/SPR)
+7. **ndarray::simd primitive coverage**:
+   100% of hot-path SIMD ops route through `ndarray::simd::*` — zero raw
+   intrinsics in the cognitive path (enforced by clippy lint and the W1a
+   consumer contract gate)
+
+### Inhabitance gates (qualitative)
+
+1. **The canary path reads like the architecture document.** A new reader
+   should be able to trace each of the 11 steps above to a specific function
+   in the codebase. If the code is more complex than the architecture
+   description, the architecture didn't get inhabited — a translation
+   layer got built.
+2. **No "Bardioc-shaped" code in the canary path.** No SQL builders for
+   the lookup, no Elasticsearch-shaped query DSL, no JanusGraph-shaped
+   traversal, no ClickHouse-shaped aggregation. The cascade is the lookup;
+   the codebook is the storage; the Rubicon is the commit. If any
+   step reaches for a legacy idiom, the canary has not inhabited.
+3. **The canary survives a sentinel-qa audit** with zero P0 SAFETY findings
+   on the new code (existing scalar reference is grandfathered).
+4. **The integration sprint produces a 30-second screen recording** showing
+   the canary running end-to-end, p99 latency on screen, codebook hit
+   rate climbing during warmup. Recording is committed to the repo.
+
+## What is NOT the canary
+
+Explicit anti-scope so the canary doesn't drift into a system migration:
+
+- **Not**: a full Bardioc → HHTL stack swap
+- **Not**: a multi-workload benchmark suite
+- **Not**: a SQL or graph-query analog of NARS revision
+- **Not**: production cutover from Bardioc
+- **Not**: a UI demo
+- **Not**: a research artifact about HHTL theory — the canary is the
+  *operational* proof, not a paper
+
+If the canary works, Bardioc cutover is a follow-on per-workload migration
+that can take months. The canary just has to demonstrate inhabitability of
+*one* path.
+
+## Where the canary lives
+
+| Component | Crate / path | Sprint |
+|---|---|---|
+| `Base17` + `Fingerprint` + `TruthValue` types | `ndarray::hpc::{nars,fingerprint,base17}` (existing) | — (pre-existing) |
+| `Hilbert3D::{encode,decode}` | `ndarray::hpc::linalg::hilbert` | PR-X10 A12 |
+| `CascadeAddr` + `from_position` + `XorProjection` | `ndarray::hpc::splat4d::cascade` | PR-X4 |
+| `SplatPyramid<T, S: GridStorage<T>, BR, BC>` | `ndarray::hpc::splat4d::pyramid` | PR-X4 + PR-X9 (GridStorage is PR-X9) |
+| `BasinCodebook` + `LazyBlockedGrid` | `ndarray::hpc::cognitive::{codebook,storage}` | PR-X9 |
+| rANS encode/decode + `CellMode` + `rdo_cell` | `ndarray::hpc::codec::*` | PR-X12 |
+| Per-pillar PASS gates (revision math certified) | `ndarray::hpc::pillar::*` | PR-X11 |
+| OGIT cognitive namespace bridge | `ndarray::hpc::ogit_bridge::*` | PR-X13 |
+| Ractor Rubicon gate (`RevisionHandler`) | `lance-graph::cognitive::nars_actor` (new) | Integration sprint |
+| SurrealDB egress (zone 2 typed surface) | `lance-graph::cognitive::nars_persist` (new) | Integration sprint |
+| End-to-end canary binary | `lance-graph/examples/nars_canary.rs` (new) | Integration sprint |
+| Measurement harness | `lance-graph/benches/nars_canary.rs` (new) | Integration sprint |
+
+The integration sprint produces the two `lance-graph::cognitive::*` modules
+that wire the substrate pieces together. The wiring is small (~200 LoC each);
+the substrate pieces are the work.
+
+## Composition with the 4-prompt strategic arc
+
+| Strategic prompt | Role | Canary relationship |
+|---|---|---|
+| `bardioc-weekend-rebuild-prompt.md` | Baseline measurement (legacy) | Produces the **NARS-revision-on-Bardioc** number the canary beats |
+| `ndarray-simd-trojan-horse-prompt.md` | Path A: ClickHouse + Tantivy FFI inject | **Independent** — analytic tier, not cognitive |
+| `databend-ndarray-simd-prompt.md` | Path C: Rust-native ClickHouse successor | **Independent** — analytic tier, not cognitive |
+| **THIS DOC + `hhtl-substrate-execution-prompt.md`** | Cognitive tier — the actual architectural win | Canary measures **revision-on-HHTL** vs the Bardioc baseline |
+
+The four-prompt arc handles the **analytic tier** (where ClickHouse used to
+live). This canary handles the **cognitive tier** (where HHTL lives). They
+compose: the analytic tier is Bardioc's escape hatch; the cognitive tier is
+the architecture's reason to exist.
+
+Both must work for the consolidation to be real. The cognitive canary is
+the harder and more important one.
+
+## Pass/fail decision
+
+If the canary passes all gates: HHTL is **inhabited**. Bardioc cognitive-tier
+cutover is a per-workload migration; analytic-tier cutover follows path A
+(buy time) or path C (replace). The consolidation arc is operationally
+proved.
+
+If the canary fails **performance gates** (latency/throughput): the
+architecture's algorithmic regime claim ("two orders of magnitude") is
+wrong. Re-examine the cascade depth, the codebook materialization cost,
+or the SIMD primitive coverage. Patch and re-measure.
+
+If the canary fails **correctness gates** (ULP/bit-exact): a substrate bug
+exists. P0 — block all dependent sprint work until resolved.
+
+If the canary fails **inhabitance gates** (qualitative): the substrate
+exists but isn't being lived in — the integration sprint built a
+translation layer instead of using the substrate primitives. Re-write
+the wiring, not the substrate.
+
+## Sequencing
+
+The canary cannot be implemented until the 6 substrate sprints land (the
+canary depends on PR-X4 + PR-X9 + PR-X10 A12 + PR-X11 + PR-X12 + PR-X13).
+**The canary is the integration sprint deliverable**, not a parallel track.
+
+The 6 sprints run per the master schedule (W1-W8 in
+`pr-master-consolidation.md`). Integration sprint = W8 = canary build +
+measure + record + write report.
+
+## What changes if the canary passes
+
+Three things become true that aren't true today:
+
+1. **The architecture document stops being a claim and becomes a measurement.**
+   The "700ns at depth 4" claim is now a number with confidence intervals.
+2. **Per-workload Bardioc cutover becomes mechanically composable.** Each
+   subsequent cognitive workload follows the canary pattern: typed surface
+   in, cascade lookup, codebook materialization, Rubicon commit, zone
+   boundary crossing. No new architectural decisions per workload.
+3. **The four strategic prompts can be executed with confidence.** Today
+   they read as "buy time + measure baseline + adopt successor." After
+   the canary passes, they read as "execute the cutover" with the cognitive
+   tier already proven.
+
+If the canary doesn't pass, those three things stay false — and the next
+session has to decide whether to debug the substrate or revisit the
+architecture.
diff --git a/.claude/knowledge/hhtl-substrate-execution-prompt.md b/.claude/knowledge/hhtl-substrate-execution-prompt.md
new file mode 100644
index 00000000..66c9c48e
--- /dev/null
+++ b/.claude/knowledge/hhtl-substrate-execution-prompt.md
@@ -0,0 +1,571 @@
+# HHTL Substrate Execution Prompt — Phase 2 Protocol A, 8 Weeks, 6 Sprints
+
+Master execution prompt for the 8-week / 6-sprint substrate build that takes
+PR-X4 + PR-X9 (and their dependencies PR-X10/X11/X12/X13) from **design to
+substrate**, culminating in the NARS-revision canary defined in
+`hhtl-canary-inhabitance-plan.md`.
+
+Companion docs (read first):
+- `pr-master-consolidation.md` — sprint plan + dependency DAG
+- `pr-master-consolidation-savant-verdict.md` — Phase 1 verdict (READY-WITH-DOC-FIXES, all 10 patches applied)
+- `pr-x4-design.md`, `pr-x9-design.md`, `pr-x10-linalg-core-design.md`, `pr-x11-jc-consolidation-design.md`, `pr-x12-codec-x265-design.md`, `pr-x13-ogit-bridge-design.md` — per-sprint specs
+- `hhtl-canary-inhabitance-plan.md` — the integration deliverable
+- `vertical-simd-consumer-contract.md` — SIMD primitives W1a contract
+- `.claude/rules/data-flow.md` — Rule #3
+
+This prompt is the **copy-paste-into-fresh-session** artifact that spawns
+each sprint per Protocol A. It is NOT a single Claude Code session — each
+sprint kickoff is its own session (Protocol A semantics make sprints
+parallelism-bounded, not session-bounded).
+
+---
+
+## How to use this prompt
+
+For each sprint window in the W1-W8 schedule, copy the relevant **§ Sprint
+kickoff** block below into a fresh Claude Code session. Authorize the listed
+tools. The session runs the sprint per Protocol A: preflight → 6 savants →
+workers → P0 fix → P2 review → merge.
+
+Sessions in different windows are independent and can run on different
+days. Sessions within the same window (e.g. PR-X11 + PR-X13 in W3) are
+independent and can run in parallel. Each sprint produces its own PR off
+`claude/pr-x4-splat-cascade-design` (or successor session branches per
+session policy).
+
+---
+
+## Phase 2 Protocol A — the cadence each sprint follows
+
+Every sprint kickoff in the schedule below runs the same 7-step Protocol A:
+
+1. **Preflight skeleton** — coordinator agent writes commented-out Rust:
+   all impl blocks `unimplemented!()`, all types stubbed, all doc-comment
+   data-flow rules in place, no bodies. ~200-400 LoC depending on sprint
+   surface. Goal: get the API shape on the page before bodies exist.
+2. **Parallel-savant fan-out (6 specialists, same skeleton, no collision)**:
+   - `savant-architect` — layering, target_feature isolation, SoA shape
+   - `sentinel-qa` — SAFETY claims, `unsafe` block audit
+   - **data-flow-savant** — Rule #3, builder exemption, &mut/&self split
+   - **distance-typing-savant** — typed-distance discipline (no `Box<dyn>`)
+   - **naming-collision-savant** — symbol clashes with shipped crates
+   - **test-coverage-savant** — parity/property/integration test plan
+   Each writes a verdict against the preflight skeleton. Verdicts can be
+   PASS, BLOCK, or ADVISORY. BLOCK halts the sprint until resolved.
+3. **Workers fill bodies** — N workers (per-sprint count below), each
+   owning one file, parallel where the dependency graph permits. Workers
+   import the preflight types; they do not edit type signatures unless a
+   savant explicitly demanded it.
+4. **Codex P0 audit on combined diff** — runs against the whole sprint
+   diff once all workers report green. Codex is invoked via the existing
+   audit harness; output committed to `.claude/knowledge/pr-x{N}-codex-audit.md`.
+5. **Coordinator fixes P0s** — every P0 must be resolved before P2 review.
+6. **P2 savant pre-merge review** — joint plan-review savant with full
+   diff context. Output: SHIP / DO-NOT-SHIP / SHIP-AFTER-X. Committed to
+   `.claude/knowledge/pr-x{N}-p2-savant-review.md`.
+7. **Merge, integration test, signal next sprint** — merge gates: green
+   `cargo test --workspace --features <sprint-feature>`, green
+   `cargo clippy -- -D warnings`, SHIP verdict from P2 savant.
+
+PR-X3 had a post-merge UB finding (overlapping `&mut [T]`). Protocol A's
+data-flow-savant at step 2 would have caught that. The protocol is
+load-bearing — do not skip steps.
+
+---
+
+## Sprint sequencing — the 8-week schedule
+
+| Week | Sprints | Workers | Parallel? |
+|---|---|---|---|
+| W1-W2 | PR-X10 (linalg-core foundation) | 12 (max fan-out: A1 → A2-A12 parallel) | Internal max-parallel |
+| W3 | PR-X11 (jc consolidation) + PR-X13 (OGIT bridge) | 6 + 4 | Yes (independent sprints) |
+| W4-W5 | PR-X12 (codec) + PR-X4 (splat cascade) | 8 (effective 4-way per P1-4) + 5 | Yes (independent sprints) |
+| W6-W7 | PR-X9 (basin-codebook) | 6 | Single sprint (depends on X12 + X13) |
+| W8 | Integration + canary | 3 (canary build, harness, recording) | Single sprint (canary deliverable) |
+
+Total: 44 sprint workers + 6 coordinators + 6 specialist savants (re-used
+across sprints — stateless re-roles) over 8 weeks.
+
+---
+
+## § Sprint kickoff — W1-W2: PR-X10 (linalg-core foundation)
+
+```text
+You are coordinator for PR-X10 (linalg-core), the foundation sprint of the
+HHTL substrate arc. 12 max-fan-out workers; 2-week window; produces the
+`ndarray::hpc::linalg::*` surface that every downstream sprint consumes.
+
+READ FIRST:
+- `.claude/knowledge/pr-x10-linalg-core-design.md` — the per-worker A1-A12
+  decomposition; A12 is MANDATORY Hilbert-3D per joint savant scope-cut
+- `.claude/knowledge/pr-master-consolidation.md` — sprint plan + DAG
+- `.claude/knowledge/pr-master-consolidation-savant-verdict.md` — P0/P1
+  applied state; invariant 12 governs (master ruling: path (b))
+- `.claude/knowledge/vertical-simd-consumer-contract.md` — SIMD W1a gate
+- `.claude/rules/data-flow.md` — Rule #3
+
+WORKER DECOMPOSITION (12 max-fan-out):
+- A1 (sequential) — `linalg/mod.rs` + `MatN<const N>` foundation
+- A2 (parallel) — `linalg/quat.rs` (Quat algebra)
+- A3 (parallel) — `linalg/spd.rs` (Spd2/Spd3/SpdN, sandwich ops)
+- A4 (parallel) — `linalg/eig.rs` (eig_sym_3 closed-form + Jacobi general-N)
+- A5 (parallel) — `linalg/svd.rs` (Golub-Reinsch + one-sided Jacobi)
+- A6 (parallel) — `linalg/polar.rs` (polar decomposition)
+- A7 (parallel) — `linalg/mat_exp.rs` (matrix exponential, Padé)
+- A8 (parallel) — `linalg/sh.rs` (spherical harmonics deg 0..=7)
+- A9 (parallel) — `linalg/conv.rs` (Conv1d/2d/3d typed wrappers)
+- A10 (parallel) — `linalg/attention.rs` (naive + flash, both ship)
+- A11 (parallel) — `linalg/norm.rs` + `activations_ext.rs` + `rope.rs`
+- A12 (parallel, MANDATORY) — `linalg/hilbert.rs` (Butz/Skilling 3D Hilbert
+  encode/decode, ~200 LoC; consumed by PR-X4 splat4d::cascade::CascadeAddr)
+- Tier 3 OPTIONAL (rng/vml ext/fft ext/sparse/banded) — ship only if Tier
+  1+2 finish in window; defer otherwise
+
+PROTOCOL A — execute the 7 steps in `hhtl-substrate-execution-prompt.md`.
+The 6 specialist savants for the preflight review are listed there.
+
+ACCEPTANCE GATES:
+- All A1-A12 mandatory items merged with green tests, green clippy, green
+  codex P0 audit, SHIP verdict from P2 savant
+- `cargo test --workspace --features linalg` passes
+- W1a consumer contract honored for every new public SIMD-touching fn
+- Type aliases preserve splat3d::Spd3 for backward compat (invariant: full
+  type aliases ruling)
+- Closed-form + general-N coexist per invariant 12
+
+PR FORMAT: open one PR per worker (A1..A12), all targeting a single
+integration branch `pr-x10/linalg-core`. Coordinator merges the
+integration branch as one PR to master after Protocol A step 7.
+
+BUDGET: 2 weeks. If A1 slips, all 12 workers slip — coordinator's first
+job is unblocking A1 within 48 hours.
+
+NEXT SPRINTS: W3 spawns PR-X11 + PR-X13 in parallel once PR-X10 merges.
+```
+
+---
+
+## § Sprint kickoff — W3: PR-X11 (jc consolidation) + PR-X13 (OGIT bridge)
+
+These two sprints run **in parallel**; spawn one session each. They share
+no files and have no inter-sprint dependencies.
+
+### PR-X11 (jc consolidation, 6 workers, 1 week)
+
+```text
+You are coordinator for PR-X11 (jc consolidation). 6 workers; 1-week
+window; moves jc's Spd2/Spd3/Wasserstein/signature/cov_high_d math into
+`ndarray::hpc::pillar::*` per invariant 12.
+
+READ FIRST:
+- `.claude/knowledge/pr-x11-jc-consolidation-design.md` (Pillar-8 with
+  placeholder σ_temporal per joint savant P1-2)
+- `.claude/knowledge/pr-master-consolidation.md`
+- `.claude/knowledge/pr-master-consolidation-savant-verdict.md`
+- The relevant `lance-graph/crates/jc/src/*.rs` files that move
+
+WORKER DECOMPOSITION (6 workers):
+- B1 — `pillar/mod.rs` + Pillar-6 (Spd2 ewa_sandwich_2d, from jc)
+- B2 — Pillar-7 (Spd3 ewa_sandwich_3d + koestenberger, from jc)
+- B3 — Pillar-10 (Pflug Wasserstein-1, from jc/src/pflug.rs)
+- B4 — Pillar-8 (temporal_sandwich, NEW; placeholder σ_temporal +
+  `TODO(calibrate-pillar-8-σ_temporal)` per P1-2)
+- B5 — Pillar-9 (Cov16384 / cov_high_d, Düker-Zoubouloglou CLT)
+- B6 — Pillar-11 (Hambly-Lyons signature transform)
+
+PROTOCOL A — 7 steps.
+
+ACCEPTANCE GATES:
+- All 6 pillars implemented + probe runners shipped
+- Probe PASS gates: PSD rate ≥ 0.999, log-norm concentration verifiable
+- `#[deprecated]` markers added to `lance-graph/crates/jc/src/{ewa_sandwich,
+  ewa_sandwich_3d,koestenberger,pflug}.rs` with 1-cycle transition note
+- `ndarray::hpc::pillar::*` is the canonical home; jc becomes a thin
+  probe-runner that imports pillar
+- Pillar-8 ships with documented-arbitrary placeholder σ_temporal +
+  tracking issue link
+
+PARALLELISM: B1-B6 run in parallel after Protocol A step 1 (preflight)
+lands — none of them depend on each other. Hard fan-out = 6.
+
+BUDGET: 1 week. The user's "12 agenten" cadence is the ceiling; this
+sprint hits 6 effective because pillars are file-scoped independent.
+```
+
+### PR-X13 (OGIT bridge, 4 workers, 1 week)
+
+```text
+You are coordinator for PR-X13 (OGIT embedded TTL bundle). 4 workers;
+1-week window; replaces the lance-graph-ontology hop with embedded TTL
+files via `include_str!` per joint savant P0-3.
+
+READ FIRST:
+- `.claude/knowledge/pr-x13-ogit-bridge-design.md` (include_str! confirmed
+  per P0-3)
+- The 26 OGIT TTL files (mirror PR-Z1's spec)
+
+WORKER DECOMPOSITION (4 workers):
+- D1 — `ogit_bridge/mod.rs` + the trait surface
+- D2 — `ogit_bridge/cognitive.rs` (per-namespace bridge for cognitive)
+- D3 — `ogit_bridge/parser.rs` (Turtle parser over `include_str!` strings)
+- D4 — `assets/cognitive/*.ttl` + `embedded.rs` (the 26 TTL files +
+  include_str! wiring; ~50 LoC + 900 lines TTL)
+
+PROTOCOL A — 7 steps.
+
+ACCEPTANCE GATES:
+- `include_str!` validated UTF-8 at compile time (P0-3 ruling)
+- No `include_bytes!` references anywhere in the bridge code
+- TTL files baked into the binary (~150 KB compressed)
+- Bridge exposes `cognitive_ttls()` returning `&'static [(name, str)]`
+- Zero-startup-cost lookup (no runtime parsing for the embedded path)
+- `ndarray::hpc::ogit_bridge::*` is the canonical home; lance-graph-ontology
+  bridge pattern deprecated
+
+PARALLELISM: D1 sequential (mod.rs foundation), then D2/D3/D4 parallel.
+
+BUDGET: 1 week.
+```
+
+---
+
+## § Sprint kickoff — W4-W5: PR-X12 (codec) + PR-X4 (splat cascade)
+
+These two sprints run **in parallel**; spawn one session each. They share
+no files but PR-X9 (W6-W7) depends on both.
+
+### PR-X12 (codec, 8 workers / 4-way effective parallel, 2 weeks)
+
+```text
+You are coordinator for PR-X12 (x265-style codec for cognitive basin
+compression). 8 workers; 4-way effective parallel per joint savant P1-4;
+2-week window.
+
+READ FIRST:
+- `.claude/knowledge/pr-x12-codec-x265-design.md` — RansEncoder docstring
+  per P0-1; tinyvec::ArrayVec<[CtuPartition; 85]> per P0-2; A2-A5 parallel
+  then A6-A7 parallel then A8 sequential per P1-4
+- `.claude/knowledge/pr-master-consolidation-savant-verdict.md`
+
+WORKER DECOMPOSITION (8 workers, max effective 4-way):
+- A1 (sequential) — `codec/ctu.rs` (Ctu carrier + CtuPartition + quad-tree)
+- A2 (parallel after A1) — `codec/mode.rs` (CellMode enum, 4 modes per
+  P1-4 ruling: skip/merge/delta/escape)
+- A3 (parallel) — `codec/predict.rs` (per-mode prediction)
+- A4 (parallel) — `codec/transform.rs` (DCT-like spatial xform on cell
+  deltas)
+- A5 (parallel) — `codec/quantize.rs` (quantization with `RdoConfig`)
+- A6 (parallel after A2-A5) — `codec/rdo.rs` (λ-RDO loop + `rdo_cell`)
+- A7 (parallel after A2-A5) — `codec/rans.rs` (rANS encoder; encode_symbol
+  has the builder-exemption docstring per P0-1)
+- A8 (sequential after A7) — `codec/stream.rs` (pack/unpack stream format)
+
+PROTOCOL A — 7 steps.
+
+ACCEPTANCE GATES:
+- `RansEncoder::encode_symbol(&mut self)` carries Rule #3 builder
+  exemption docstring (P0-1)
+- `CtuPartition` quad-tree uses stack-arena pattern (tinyvec::ArrayVec or
+  pre-allocated Vec indexed by u16); no `Box<CtuPartition>` heap allocs
+  on the RDO loop hot path (P0-2)
+- 4 codec modes (skip/merge/delta/escape); 5th mode (basin-shift)
+  collapsed into escape
+- rANS chosen over CABAC (cognitive symbol skew justifies)
+- `cargo test --workspace --features codec` passes
+- Compression ratio ≥ 5:1 on synthetic basin codebook fixtures
+
+PARALLELISM: per P1-4 ruling, **4-way max** (A2-A5), not 6-way. A1 → 
+[A2,A3,A4,A5] → [A6,A7] → A8.
+
+BUDGET: 2 weeks.
+```
+
+### PR-X4 (splat cascade, 5 workers, 1 week)
+
+```text
+You are coordinator for PR-X4 (splat4d temporal cascade onto BlockedGrid).
+5 workers; 1-week window. Interim worktree path `src/hpc/splat3d/v2/` per
+P1-3; public module path `crate::hpc::splat4d::*` from day one via
+mod.rs re-export.
+
+READ FIRST:
+- `.claude/knowledge/pr-x4-design.md` — module path clarification per P1-3
+- `.claude/knowledge/pr-x10-linalg-core-design.md` — A12 Hilbert-3D
+  consumed by `CascadeAddr::from_position`
+- `.claude/knowledge/pr-master-consolidation-savant-verdict.md`
+
+WORKER DECOMPOSITION (5 workers):
+- C1 (sequential) — `splat4d/mod.rs` + `CascadeAddr` type (4 bytes, cache-
+  aligned, parent/children via shift-mask)
+- C2 (parallel after C1) — `splat4d/cascade.rs` (L1-L4 cascade hops; XOR
+  projection; consumes `linalg::hilbert::Hilbert3D::encode` from PR-X10)
+- C3 (parallel) — `splat4d/pyramid.rs` (SplatPyramid<T, S: GridStorage<T>,
+  BR, BC>; storage is generic over PR-X9's GridStorage trait, defaults
+  to BlockedGrid for v1)
+- C4 (parallel) — `splat4d/temporal_sandwich.rs` (Pillar-8 consumer +
+  temporal drift sandwich)
+- C5 (parallel) — `splat4d/raster.rs` (cascade-aware rasterization;
+  backward-compat shim wrapping splat3d::tile.rs)
+
+PROTOCOL A — 7 steps.
+
+ACCEPTANCE GATES:
+- `crate::hpc::splat4d::*` reachable from day one via mod.rs re-export
+  (P1-3)
+- CascadeAddr is 4 bytes, deterministic XOR cascade
+- L1-L4 hop traversal in <400ns p99 (cache-resident path) — see
+  `hhtl-canary-inhabitance-plan.md` performance gate 3
+- splat3d::tile.rs becomes a shim, deprecated 1-cycle
+- SplatPyramid storage-polymorphic over GridStorage<T> (PR-X9 trait)
+- `cargo test --workspace --features splat4d` passes
+
+PARALLELISM: C1 sequential, then C2-C5 parallel. 4-way effective.
+
+BUDGET: 1 week.
+
+NEXT SPRINT: W6-W7 spawns PR-X9 once both PR-X12 and PR-X4 merge.
+```
+
+---
+
+## § Sprint kickoff — W6-W7: PR-X9 (basin-codebook, 6 workers, 1.5 weeks)
+
+```text
+You are coordinator for PR-X9 (lazy basin-codebook with LazyBlockedGrid).
+6 workers; 1.5-week window. Depends on PR-X12 (codec primitives) and
+PR-X13 (OGIT bridge). Per P0-4, PR-X9 A5 uses PR-X12's codec primitives
+verbatim (no codec re-implementation in this sprint).
+
+READ FIRST:
+- `.claude/knowledge/pr-x9-design.md` — GridStorage trait with
+  `T: Copy, const BR, const BC` type params per P1-5; A5 narrowed scope
+  per P0-4
+- `.claude/knowledge/pr-x12-codec-x265-design.md` — the codec surface
+  PR-X9 A5 consumes
+- `.claude/knowledge/pr-x13-ogit-bridge-design.md` — the OGIT cognitive
+  namespace PR-X9 attaches basins to
+
+WORKER DECOMPOSITION (6 workers):
+- E1 (sequential) — `cognitive/storage.rs` (GridStorage<T: Copy, const BR,
+  const BC> trait + impl for BlockedGrid per P1-5 stable-1.94 fix)
+- E2 (parallel after E1) — `cognitive/lazy_grid.rs` (LazyBlockedGrid<T, BR,
+  BC>: present-cells in BlockedGrid, absent-cells materialized on demand
+  under Rubicon write-back gate)
+- E3 (parallel) — `cognitive/codebook.rs` (BasinCodebook: per-cell rANS-
+  encoded payload + decode-on-access cache; bounded LRU)
+- E4 (parallel) — `cognitive/revise.rs` (NARS revision lifted to
+  GridStorage<T>; consumes `ndarray::hpc::pillar::Pillar-7` for
+  certification)
+- E5 (parallel) — `cognitive/encode.rs` (encode_from_dense using
+  `ndarray::hpc::codec::{CellMode, MergeDir, rdo_cell, RdoConfig}` per
+  P0-4 — no codec re-impl)
+- E6 (parallel) — `cognitive/parity.rs` (BlockedGrid ↔ LazyBlockedGrid
+  cell-by-cell parity test harness; integration target)
+
+PROTOCOL A — 7 steps.
+
+ACCEPTANCE GATES:
+- GridStorage trait compiles on stable Rust 1.94 (no generic const
+  expressions) per P1-5
+- LazyBlockedGrid implements GridStorage<T, BR, BC> with on-demand
+  materialization under Rubicon write-back gate (single-target gated XOR
+  semantics per data-flow.md Rule #3)
+- Codec surface imported from PR-X12, not re-implemented (P0-4)
+- BlockedGrid ↔ LazyBlockedGrid parity: per-cell L1 distance ≤
+  `epsilon_floor` for any RdoConfig
+- `cargo test --workspace --features cognitive` passes
+- Codebook hit rate target ≥ 95% on warmed-up workload (canary gate
+  performance #4)
+
+PARALLELISM: E1 sequential (GridStorage foundation), then E2-E6 parallel.
+5-way effective.
+
+BUDGET: 1.5 weeks.
+
+NEXT SPRINT: W8 integration + canary.
+```
+
+---
+
+## § Sprint kickoff — W8: Integration + Canary (3 workers, 1 week)
+
+```text
+You are coordinator for the integration sprint. 3 workers; 1-week window;
+delivers the **NARS-revision canary** defined in
+`hhtl-canary-inhabitance-plan.md`.
+
+This sprint is where the substrate stops being parts and becomes a system.
+
+READ FIRST:
+- `.claude/knowledge/hhtl-canary-inhabitance-plan.md` — THE canary spec
+  (workload, 11 substrate steps, correctness gates, performance gates,
+  inhabitance gates)
+- `.claude/knowledge/stack-consolidation-bardioc-to-hhtl.md` — Rubicon
+  model, zone boundaries, three-legged stool
+- `.claude/knowledge/bardioc-weekend-rebuild-prompt.md` — the baseline
+  the canary measures against
+
+WORKER DECOMPOSITION (3 workers):
+- F1 — `lance-graph/cognitive/nars_actor.rs` (~200 LoC)
+  - Ractor actor with mailbox = `PerceptualSurface`
+  - Handler = Rubicon crossing: cascade route → basin lookup →
+    materialize-on-cold → NARS revise → write-back via gated XOR
+  - Per-thought bindspace owned by the message lifetime
+  - `&mut self` ONLY in the handler, and only for the gated commit
+- F2 — `lance-graph/cognitive/nars_persist.rs` (~200 LoC)
+  - Zone-1→zone-2 boundary: typed surface in (NarsBeliefRevision),
+    SurrealDB ACID-tx out
+  - Typed surface defined in ndarray::hpc::*; no DTO layer
+  - Zone-2→zone-3 (sea-orm SQL egress) optional and behind a feature flag
+- F3 — `lance-graph/examples/nars_canary.rs` + `lance-graph/benches/nars_canary.rs`
+  - End-to-end binary: ingest 1M synthetic perceptual surfaces, route
+    through HHTL cascade, revise, commit, measure
+  - Bench harness: p50/p95/p99 latency (warm + cold), throughput,
+    codebook hit rate
+  - 30-second screen recording committed to repo
+
+PROTOCOL A — 7 steps (lightweight — small surface, 3 workers).
+
+CANARY ACCEPTANCE GATES (from hhtl-canary-inhabitance-plan.md):
+
+Correctness (binary, all must pass):
+1. Revision output bit-exact (Fingerprint) and ULP ≤ 4 (TruthValue) vs
+   `src/hpc/nars.rs::revise` scalar reference, 10,000 seeded revisions,
+   zero divergences
+2. Cascade routing deterministic across 100 runs
+3. No `&mut self` in compute paths (clippy + sentinel-qa audit)
+4. No static/lazy_static carrying mutable cognitive state in zone 1
+5. Typed surfaces at zone boundaries (no serde_json::Value, no DTOs)
+
+Performance (numeric, all must pass on Zen4 or SPR 8-core, AVX-512):
+1. p99 revision latency warm: ≤ 1.5 µs
+2. p99 revision latency cold: ≤ 15 µs
+3. Cascade-only latency: ≤ 400 ns p99
+4. Codebook hit rate after 1M warmup: ≥ 95%
+5. Throughput saturated: ≥ 1M revisions/sec per core sustained 10s
+6. Working set per worker: ≤ 1 MB
+7. ndarray::simd primitive coverage: 100% of hot-path SIMD
+
+Inhabitance (qualitative):
+1. Canary code reads like the architecture document — 11 substrate steps
+   traceable to 11 specific function calls
+2. No Bardioc-shaped code in canary path (no SQL builders, no ES DSL,
+   no JanusGraph traversals, no ClickHouse aggregations)
+3. Sentinel-qa P0 SAFETY findings on new code: zero
+4. 30-second screen recording committed (canary running end-to-end, p99
+   on screen, hit rate climbing during warmup)
+
+DELIVERABLE: `.claude/knowledge/pr-x4-x9-canary-results.md` — measured
+numbers per gate; SHIP / RE-MEASURE / RE-ARCHITECT decision; comparison
+against the Bardioc baseline from bardioc-weekend-rebuild-prompt.md (if
+the baseline has been run); next-steps recommendations.
+
+BUDGET: 1 week. If a gate fails, document the failure in the results
+doc, then decide:
+- Performance fail → re-examine cascade depth, codebook materialization
+  cost, or SIMD primitive coverage; patch and re-measure
+- Correctness fail → P0; block dependent sprint work until resolved
+- Inhabitance fail → re-write the wiring (F1/F2), not the substrate
+
+CANARY OUTCOME:
+PASS → HHTL is operationally proved; per-workload Bardioc cutover becomes
+    mechanically composable; analytic-tier paths (A: trojan horse, C:
+    Databend) can be executed with confidence.
+FAIL → HHTL claim is not yet validated; the next session decides whether
+    to debug the substrate or revisit the architecture.
+```
+
+---
+
+## Cross-sprint operational notes
+
+### Specialist savant rotation
+
+The 6 specialist savants are **stateless re-roles**, not per-sprint
+incarnations. The same `data-flow-savant` reviews PR-X10 preflight, then
+PR-X11 preflight, then PR-X12 preflight, etc. Reduces savant context-switch
+overhead per joint savant decision 6 ruling.
+
+### Codex P0 audits
+
+Run codex on the combined sprint diff at step 4, not per-worker. Output
+goes to `.claude/knowledge/pr-x{N}-codex-audit.md`. Coordinators must
+resolve every P0 before P2 review at step 6.
+
+### Branch hygiene
+
+Each sprint uses an integration branch (`pr-x{N}/integration`); per-worker
+PRs target the integration branch, coordinator merges integration to
+master after Protocol A step 7. Avoids 12 simultaneous PRs to master.
+
+### Deprecation cycle
+
+PR-X11 marks jc files `#[deprecated(since="0.X", note="moved to
+ndarray::hpc::pillar")]` for one cycle. Removal in cycle N+2. PR-X13
+supersedes lance-graph-ontology bridge pattern with the same cadence.
+
+### Feature gate matrix (additive)
+
+```toml
+# Default
+default = ["std", "linalg"]
+
+# Per-sprint
+splat3d        = ["dep:..."]
+splat4d        = ["splat3d", "linalg"]
+blocked_grid   = ["std"]
+linalg         = ["std"]
+pillar         = ["linalg"]
+codec          = ["std", "blocked_grid"]
+ogit_bridge    = ["std"]
+cognitive      = ["blocked_grid", "linalg", "codec", "ogit_bridge"]
+
+# Aggregates
+cognitive_full = ["cognitive", "splat4d", "pillar"]
+```
+
+Default builds stay small; canary opts in to `cognitive_full`.
+
+### Backward compat for splat3d consumers
+
+`pub use crate::hpc::linalg::Spd3 as Spd3;` etc — Rust monomorphizes
+across type aliases (same type, not new type). Existing splat3d
+consumers compile unchanged after PR-X10 lands.
+
+---
+
+## What this prompt does NOT do
+
+- It does not run the 4-prompt analytic-tier arc (Bardioc baseline,
+  trojan horse, Databend). Those are independent and can run in parallel
+  with the substrate arc. The canary measures against the Bardioc
+  baseline if it has been run; absent that, the canary measures absolute
+  numbers.
+- It does not migrate Bardioc workloads. The canary proves
+  *inhabitability* of one workload; per-workload migration is a follow-on
+  multi-month effort.
+- It does not address HHTL theory or paper-writing. The canary is the
+  operational proof; theory artifacts are downstream.
+- It does not contain code. It contains the kickoff prompts for each
+  sprint session; code is written inside those sessions.
+
+---
+
+## Done criteria (substrate arc, 8 weeks)
+
+The substrate arc is "done" when:
+
+- All 6 sprints land per the W1-W8 schedule (44 sprint workers + 6
+  coordinators + 6 specialist savants)
+- `ndarray::hpc::*` 10-submodule layout is the canonical structure
+- jc deprecated 1 cycle; lance-graph-ontology bridge pattern superseded
+- The NARS-revision canary passes all 3 gate classes (correctness +
+  performance + inhabitance)
+- 30-second screen recording committed showing canary running end-to-end
+- `.claude/knowledge/pr-x4-x9-canary-results.md` written with measured
+  numbers and SHIP / RE-MEASURE / RE-ARCHITECT decision
+
+If all six criteria hit on schedule: HHTL is inhabited. Bardioc cognitive-
+tier cutover is now a mechanical per-workload migration; the analytic
+tier follows path A or path C per the four-prompt arc. The architecture
+that started as a strategic document is now an operational substrate.
diff --git a/.claude/knowledge/pr-x10-linalg-core-design.md b/.claude/knowledge/pr-x10-linalg-core-design.md
index 02098e78..c5d5cd2d 100644
--- a/.claude/knowledge/pr-x10-linalg-core-design.md
+++ b/.claude/knowledge/pr-x10-linalg-core-design.md
@@ -42,6 +42,7 @@ src/hpc/linalg/
 ├── svd.rs                — Golub-Reinsch + one-sided Jacobi SVD
 ├── polar.rs              — A = U·P decomposition (built on SVD)
 ├── matfn.rs              — mat_exp + mat_log (Padé + scaling-and-squaring)
+├── distance.rs           — L1 / L2 / L∞ over f64x8 lanes (absorbed from PR #160 heel_f64x8)
 ├── quat.rs               — Quat carrier + algebra (mul, conjugate, slerp, from_axis_angle, to_mat)
 ├── sh.rs                 — extended SH (deg 0..=7) — supersedes splat3d/sh.rs deg-3 only
 ├── conv.rs               — Conv1D + Conv2D (im2col + gemm path, direct path for small kernels)
@@ -169,6 +170,26 @@ Higham's scaling-and-squaring Padé(13/13) for general matrices (3 × ε_machine
 
 **Precision class: EXACT** for SPD path (via `eig_sym` + scalar `vml::exp_f32`/`vml::ln_f32`); **VERIFY** for general path (Padé approximant order vs scaling depth trade-off).
 
+### Distance kernels — `linalg::distance`
+
+```rust
+pub fn l1_f64_simd(a: &[f64], b: &[f64]) -> f64 { ... }     // Σ |a_i − b_i|
+pub fn l2_f64_simd(a: &[f64], b: &[f64]) -> f64 { ... }     // √Σ (a_i − b_i)²
+pub fn linf_f64_simd(a: &[f64], b: &[f64]) -> f64 { ... }   // max |a_i − b_i|
+```
+
+Lane-parallel over `F64x8` with horizontal reduce at the tail. Absorbs the
+`heel_f64x8::l1/l2/linf` kernels from PR #160 (lance-graph) — the code is
+correct, the framing was wrong (it was filed as "Sprint 0a of a four-repo
+integration arc"; the right home is here, alongside polar / matfn in the
+linalg core). Bench parity vs the PR #160 implementation is part of the A6
+acceptance gate, not a separate worker.
+
+**Precision class: EXACT** for L1 and L∞ (no rounding beyond the underlying
+subtract + abs). **VERIFY** for L2 (the final `sqrt` is one ULP; the sum is
+order-of-summation dependent — A6 uses pairwise reduce for determinism, same
+shape as `blas_level1::nrm2`).
+
 ### Higher-degree SH — `linalg::sh`
 
 Supersedes `splat3d::sh.rs` (which ships deg-3 only). Adds deg-4 through deg-7:
@@ -459,7 +480,7 @@ This is a LARGE sprint. Per the user's "12 agents + 1 coordinator" cadence:
 | 6 | **A3 — Matrix inverse (3×3, 4×4, general)** | 1 | `linalg/inverse.rs` | ~300 |
 | 7 | **A4 — Symmetric eig (Jacobi + QR)** | 1 | `linalg/eig_sym.rs` | ~450 |
 | 8 | **A5 — SVD (Golub-Reinsch + one-sided Jacobi)** | 1 | `linalg/svd.rs` | ~500 |
-| 9 | **A6 — Polar + mat_exp + mat_log** | 1 | `linalg/polar.rs`, `linalg/matfn.rs` | ~400 |
+| 9 | **A6 — Polar + mat_exp + mat_log + distance** | 1 | `linalg/polar.rs`, `linalg/matfn.rs`, `linalg/distance.rs` (absorbs PR #160 `heel_f64x8::l1/l2/linf`) | ~500 |
 | 10 | **A7 — SH deg 0..=7** | 1 | `linalg/sh.rs` (supersedes `splat3d/sh.rs`) | ~400 |
 | 11 | **A8 — Conv1D + Conv2D** | 1 | `linalg/conv.rs` | ~450 |
 | 12 | **A9 — Batched gemm + Norms + Activations** | 1 | `linalg/batched.rs`, `linalg/norm.rs`, `linalg/activations_ext.rs` | ~550 |
@@ -516,7 +537,7 @@ Plus parity gates:
 
 3. **f64 path?**: splat3d is f32-only. Inference modules are f32. Pillar probes use f64 internally for concentration math. Does `linalg-core` ship f32 AND f64? Lean: **f32 primary** (matches the rest of `hpc::*`), add `_f64` variants only on demand. Savant: rule on whether to pre-ship f64 for the Pillar consumers.
 
-4. **`jc` consolidation path (a) vs (b)**: keep jc zero-dep on ndarray (path a) or relax for SPD only (path b)? Architectural call. Lean: **(a)** preserves the self-certifying property. Coordinator: confirm with jc-architect before committing.
+4. **`jc` consolidation path (a) vs (b)**: ~~keep jc zero-dep (path a) or relax for SPD only (path b)?~~ **RESOLVED by joint savant P1-1 + invariant 12 (master ruling): path (b) — jc's math consolidates into `ndarray::hpc::pillar::*`. PR-X10 does not decide this; it ships the canonical ndarray-side surface that PR-X11 then consumes.** See §"PR-X11 consumption pattern" L390 above.
 
 5. **Flash-attention as v1 or v2?**: flash-attention is ~3× the implementation complexity of naive attention. v1 ships naive only; v2 adds flash. OR v1 ships both. Lean: **v1 ships both** — the inference modules need flash for any sequence longer than ~512 tokens. Cost: ~250 extra LoC on A10.
 
diff --git a/.claude/knowledge/stack-consolidation-bardioc-to-hhtl.md b/.claude/knowledge/stack-consolidation-bardioc-to-hhtl.md
index 245de223..2c9f9362 100644
--- a/.claude/knowledge/stack-consolidation-bardioc-to-hhtl.md
+++ b/.claude/knowledge/stack-consolidation-bardioc-to-hhtl.md
@@ -90,16 +90,134 @@ aggregate-scan queries. So the ClickHouse strength is irrelevant, not absent.
 
 ## Zone model
 
-| Zone | Layer | Role | Boundary contract |
+| Zone | Column-state phase | Surface that watches | What "being in this zone" means |
 |---|---|---|---|
-| **Zone 1** (hot / in-process) | lance-graph + ndarray + Ractor | cognitive shader stack, Rubicon gates, HHTL cascade | typed surfaces, no serde, Rule #3 territory |
-| **Zone 2** (warm / persistence) | SurrealDB (+ Tantivy FTS) | cognitive system's own state — committed outcomes only | typed surfaces in, ACID-tx out |
-| **Zone 3** (cold / egress) | sea-orm | legacy SQL bridge (PostgreSQL, MySQL, host org's DB) | DTOs / SQL rows — materialization happens here |
-
-**Ractor lives at zone boundaries**, never inside the zone-1 cascade. Actors are
-the gates between deliberation and persistence (1↔2) and between persistence
-and legacy egress (2↔3). Inside zone 1, the cascade is pure function composition
-over typed surfaces.
+| **Zone 1** (hot) | `committed = false`, currently held in mailbox-cycle scope | lance-graph cascade ops | the row is being deliberated; cascade compute is in-flight against the same bytes a future Zone-2 reader will see |
+| **Zone 2** (warm) | `committed = true`, Lance-versioned | SurrealDB LIVE subscriptions, lance-graph reads | the row's truth-value crossed the Rubicon; any LIVE watcher with a matching predicate observes the flip as a column-state transition |
+| **Zone 3** (cold) | `egressed_at IS NOT NULL`, mirrored once | sea-orm to legacy RDBMS | the row has been materialised into PG-shape for the legacy surface; the source Lance bytes are unchanged |
+
+**Zones are temporal phases of column state on a single Lance dataset, not
+storage tiers.** Same physical bytes throughout. A row does not "move" from
+zone 1 to zone 2; a column flips from `committed = false` to `true`, and
+the LIVE watchers notice. There is no serialise / marshal / wire-format
+step between strata because there are no strata — there is one Lance
+dataset, multiple state-flag columns, and multiple dialect surfaces reading
+the same buffers.
+
+This is the right framing for the Rubicon model: the crossing is a *column
+flip*, not a write event. There is no "mailbox in RAM commits to
+SurrealDB" — SurrealDB always saw the row, the row just changed state. The
+mailbox-cycle still governs the commit (the handler decides when to flip
+the flag, and `&mut self` there is the gated write), but the flip itself
+is a state transition on bytes that didn't move.
+
+What stays true from earlier framings:
+- The cascade inside a single handler body is pure function composition over
+  typed surfaces (Rule #3 territory)
+- The `&mut self` in the handler IS the gated write — legitimate because it
+  IS the Rubicon crossing (the column flip), not "during computation"
+- Typed surfaces at the dialect interfaces (SurrealQL parses to column
+  predicates; sea-orm projects to legacy DTOs; Databend pushes filters to
+  column kernels) — but these are *type-level* contracts on how each
+  dialect reads the same bytes, not perimeters around different stores
+
+See § "Column-substrate identity" below for the full unification.
+
+## Column-substrate identity — Lance ≡ Arrow ≡ ndarray SoA
+
+```
+Lance dataset (single physical store)
+     │
+     ▼
+Lance column  ≡  Arrow column buffer  ≡  ndarray SoA
+                  (one representation, all the way down)
+     │
+     ├──→ lance-graph: XOR-cascade lookups, cognitive-shader cycles
+     │       (ndarray SIMD ops directly over the column bytes;
+     │        no copy, no serde, no marshal — the "in-RAM Thought"
+     │        IS the Lance column slot)
+     │
+     ├──→ SurrealDB: SurrealQL parses → reads the same column
+     │       LIVE subscription = a watch on column-state predicates
+     │
+     ├──→ sea-orm: SQL via Lance backend → reads the same column
+     │       (Zone-3 egress is materialise-once into PG-shape for the
+     │        legacy surface; the source bytes are unchanged)
+     │
+     ├──→ Databend: analytic SQL → reads the same column
+     │       (ndarray::simd kernel swap → operates on the same bytes
+     │        the cognitive cascade just operated on)
+     │
+     └──→ Tantivy: FTS index → built over the same column
+```
+
+**One physical representation, end to end.** The Lance column layout, the
+Arrow column buffer layout, and the ndarray SoA layout are the same bytes
+viewed through three names. The four dialect surfaces (lance-graph cascade,
+SurrealDB, sea-orm, Databend, Tantivy) all parse their respective query
+languages down to operations on those same bytes.
+
+**ndarray amortises the SIMD primitive across the whole stack.** The same
+kernel that runs the cognitive cascade, that Databend's filter pushdown
+invokes, that Tantivy's indexer reads, that sea-orm projects to legacy
+egress — they are the same kernel on the same bytes. ndarray pays for the
+SIMD primitive once and the entire stack collects rent. No transcode tier,
+no copy boundary, no format conversion at any zone.
+
+**Rubicon = column-state flip, not write event.** A Thought is a Lance row
+from the moment it is allocated to the moment it is queried by any surface.
+"Crossing the Rubicon" means flipping (e.g.) `committed: false → true` —
+versioned natively by Lance, observed by any LIVE watcher with a matching
+predicate, no serialisation involved.
+
+### What this dissolves
+
+| Earlier framing (wrong) | Why it's wrong |
+|---|---|
+| "Mailbox writes to SurrealDB on Rubicon crossing" | There is no write — SurrealDB always saw the row; the row just changed state |
+| "MvccProvider::snapshot_ts threads across engines" | There is one Lance dataset with one version chain; all readers see the same version |
+| "surrealdb-ractor as cf-event router" | No cf-event-as-message needed; mailboxes already share the same column slice that SurrealDB watches |
+| "sea-orm-ractor entity-actor dispatch by PK" | The mailbox IS the row; no separate dispatch layer |
+| "Zone 1 in-process vs Zone 2 durable" (as storage tiers) | Same physical bytes; zones are temporal phases of column state, not storage tiers |
+| "TiKV as routing / coordination layer" | TiKV ranges are Lance dataset shards under the XOR cascade — substrate, not routing |
+| "kv-lance translates records into Lance rows for SurrealDB" | No translation; SurrealQL parses directly against Lance columns that lance-graph already owns |
+
+### What survives — JITson / Cranelift, cleaner than before
+
+The compile-time → JIT pipeline does not collapse with the framing — it
+sharpens:
+
+- **ndarray SoA layout = Lance column layout = known at OGIT-schema-compile time.**
+  The schema fixes the column shape; everything downstream specialises against it.
+- **`DeriveEntityModel` (or equivalent) emits column-typed accessors at Rust
+  compile time** — typed handles into the same bytes for each dialect surface.
+- **Cranelift JITs hot-path kernels specialised for the OGIT-derived column
+  types at first call** — predicate compilation, projection compilation,
+  cascade-step compilation, all against the typed column shape.
+- **"Sinkin becomes compile next time"** — when a new column shape enters the
+  substrate (ontology evolution), the next compile cycle regenerates the typed
+  accessors and the JIT re-specialises against the new shape.
+- **All four dialect surfaces automatically inherit the new kernels** because
+  they all operate on the same column layout. Add a column → all surfaces
+  see it. Specialise a kernel → all surfaces use it.
+
+### Implication for the four-tier picture
+
+The four-tier picture earlier in this doc names `ndarray::simd` as "the
+common SIMD substrate across all four tiers". That claim is correct, but
+its load-bearing reason is the column-substrate identity, not "we happen
+to use the same SIMD library in four places". The deeper fact:
+
+> **The column IS the SoA IS the ndarray buffer.** The cognitive cascade,
+> the analytic scan, the FTS index build, and the graph traversal all
+> operate on the same bytes through the same SIMD kernels. ndarray::simd
+> is the common substrate because the substrate is genuinely one thing,
+> not four parallel things wearing the same uniform.
+
+This is the actually-clean Foundry-aspiring shape: one physical store, one
+column layout, one kernel set, multiple dialect surfaces. The "same data,
+different syntax" claim is finally literal — not "same schema across
+translation layers" but **same bytes, period.**
 
 ## Rule #3 ⊕ Rubicon ⊕ Per-thought bindspace (three-legged stool)
 
@@ -204,7 +322,7 @@ depending on workload count.
 | HHTL distribution math is wrong | High | This is the load-bearing claim; numerical certification (PR-X11 pillars) covers cascade ops; add formal proofs for the XOR-projection bijectivity property before zone-2 commit |
 | 90° vector / Walsh-Hadamard basis breaks for non-projectable queries | High | API enforces "queries must be expressible in basis"; queries that aren't are bounced back to the caller with a typed error, not silently scanned |
 
-## Click-moments inventory (the three architectural dissolutions)
+## Click-moments inventory (the four architectural dissolutions)
 
 These are the moments where a perceived problem turned out to not be a problem:
 
@@ -215,17 +333,32 @@ These are the moments where a perceived problem turned out to not be a problem:
 2. **Ractor `&mut self` violated Rule #3** → **Rubicon model shows actors are
    commitment gates, not shared-state mutators.** The handler body IS the
    Rubicon crossing; `&mut self` there is the gated write, not "during
-   computation". Dual to Rule #3, not opposed.
+   computation". Dual to Rule #3, not opposed. **Refinement** (2026-05-19,
+   post-PR #404 rollback): the mailbox carries the commitment responsibility
+   implicitly, so there is no physical boundary between zones 1/2/3 for
+   actors to "live at" — Rubicon is per-mailbox-commit-cycle, distributed
+   everywhere there is a handler.
 
 3. **ClickHouse OLAP gap blocked the new stack** → **HHTL shows the cognitive
    workload doesn't need OLAP, just project-and-lookup.** ClickHouse stays in
    Bardioc and is decommissioned when the last scan-aggregate query is ported
    (which is never, because cognitive queries don't have that shape).
 
-All three dissolutions are structural — they don't require new code, they
+4. **Multi-store consistency / cross-zone messaging looked like the hard
+   coordination problem** → **Column-substrate identity shows there is no
+   cross-zone messaging.** Lance column ≡ Arrow buffer ≡ ndarray SoA, same
+   bytes for every dialect surface (lance-graph, SurrealDB, sea-orm, Databend,
+   Tantivy). Rubicon is a column-state flip, not a write event. SurrealDB
+   LIVE subscriptions watch column predicates on bytes they were already
+   reading. The hard problem dissolves because there were never multiple
+   stores to keep consistent. See § "Column-substrate identity" above.
+
+All four dissolutions are structural — they don't require new code, they
 require seeing the existing architecture through the correct frame. That's why
 they "click hard": the answer was already in the design; it just needed the
-right name.
+right name. Dissolutions 1–3 are workload-shape dissolutions; #4 is the
+substrate-identity dissolution and is the deepest of the four — it makes
+the other three's "no copy, no marshal, no coordination" claims literal.
 
 ## What's NOT covered by this consolidation
 
@@ -243,6 +376,158 @@ Honesty roster — things that genuinely don't fit and need separate stories:
   the typed-surface contract from zone 1 means schema changes ripple through
   Rust types. Migration discipline TBD.
 
+## Four-tier picture: HHTL only has to win the cognitive layer
+
+The synthesis that compresses the whole consolidation arc to one diagram.
+Bardioc had four CPU-heavy specialty layers. **Three of them already have
+Rust-native successors that aren't HHTL.** HHTL only has to win at the
+cognitive layer it was designed for.
+
+| Tier | Workload shape | Bardioc layer | Rust-native successor | Acceleration |
+|---|---|---|---|---|
+| **Cognitive** (hot path, sub-µs) | project-and-lookup, cascade routing | (no Bardioc analog — application code) | **HHTL** = TiKV + SurrealDB + Ractor + ndarray + lance-graph | ndarray::simd (native), HHTL projection (2 orders of magnitude vs scan) |
+| **Analytic** (cold path, ms) | scan-and-aggregate, OLAP, ad-hoc SQL | ClickHouse | **Databend** (Arrow + DataFusion + Tokio, MIT, ClickHouse-shape) | ndarray::simd injection (filter / aggregate / hash kernels) |
+| **Search** (full-text) | inverted index, BM25, ngram, faceting | Elasticsearch / Lucene | **Tantivy** (under SurrealDB FTS, also via Quickwit) | ndarray::simd injection (bitpack decode / BM25 / skip-list intersection) |
+| **Graph** (traversal) | BFS/DFS, edge-label filter, frontier expansion | JanusGraph | **lance-graph native** (typed surfaces over TiKV ranges) | ndarray::simd (frontier bitsets), no JNI |
+
+This is the load-bearing reframe for the migration argument:
+
+- **HHTL is genuinely new IP** — nothing exists like it; the cognitive layer
+  is where the architecture earns its keep.
+- **The other three are inheritances** — Databend / Tantivy / lance-graph are
+  pre-existing Rust-native engines that already do what their Bardioc
+  counterparts did, just in Rust with Tokio and Arrow.
+- **ndarray::simd is the common SIMD substrate across all four tiers** —
+  injection target for Databend + Tantivy (the trojan-horse prompts);
+  native for HHTL + lance-graph (the hot-path cognitive substrate).
+
+Migration scope shrinks proportionally. The total work is:
+1. Build HHTL (PR-X4 + PR-X9, the genuinely new piece).
+2. Adopt Databend, Tantivy, lance-graph (existing, just integrate).
+3. Inject ndarray::simd into Databend + Tantivy (trojan horse prompts,
+   1–2 engineer-weeks per target).
+4. Cutover from Bardioc one workload at a time (read-only mirror → dual
+   write → primary-flip → decommission, the existing migration plan).
+
+**No transcode of ClickHouse, ES, or JanusGraph is required, ever.**
+
+## Why we don't transcode ClickHouse (cheap escape hatches)
+
+A full ClickHouse transcode is one of the hardest software undertakings in
+modern infrastructure: ~1.2M LOC C++ core, ~150 vendored libraries, ~1000
+hand-tuned aggregation/scalar functions, decades of SIMD/cache/JIT
+optimization. Realistic cost: **5–10 engineer-years**. Reference points:
+TiKV's Rust rewrite took ~5 years with the original team; Servo's
+C++→Rust port took ~10 years and ended partial; the Postgres→CockroachDB
+conceptual port is still incomplete after a decade.
+
+Three cheaper escape hatches, in order of cost:
+
+| Approach | Cost | Outcome |
+|---|---|---|
+| **A. FFI inject ndarray::simd into ClickHouse** (trojan horse prompt) | 1–2 engineer-weeks | ClickHouse stays C++, hot kernels are Rust; legacy stack faster, Bardioc cutover urgency reduced |
+| **B. Transcode only the vectorized executor** (~50–100k LOC) | 1–2 engineer-years | Hybrid C++ shell + Rust executor core; deep IP investment, narrow scope |
+| **C. Adopt Databend + ndarray::simd injection** (databend prompt) | 0 transcode | Rust-native, ClickHouse-shape, MIT licensed, already maintained, rides upstream |
+
+**Recommended: C.** Databend already covers ClickHouse-shape workloads in
+Rust on Arrow + DataFusion + Tokio. ndarray::simd injection earns the
+"hand-tuned" performance parity. Combined cost is engineer-weeks, not
+engineer-years, with zero transcode debt.
+
+A is also valuable in parallel — it accelerates Bardioc during the cutover
+window and creates upstream contribution opportunities. B is rarely worth
+it; only justified if you need ClickHouse-storage-format wire-compatibility
+in a Rust-native engine, which the cognitive stack does not.
+
+The C# ecosystem analog (asked separately): RavenDB is the closest
+single-binary-vendor-everything analog to ClickHouse in .NET, with
+EventStoreDB second. Neither is performance-competitive with ClickHouse on
+OLAP scan, but they share the operational philosophy. Notable because the
+ClickHouse design pattern (full vendoring + native compilation +
+obsessive SIMD/cache tuning + willingness to patch upstream) is rare —
+ClickHouse may be the only OSS database that does all four. Yandex
+heritage is what made it possible.
+
+## Salvage from the 2026-05-19 cross-repo rollback (PR #404 / PR #160)
+
+The four-repo demo PR #404 in lance-graph (and its companion ndarray PR #160)
+was reverted via PR #405 on 2026-05-19 — the architectural intent is
+preserved as a next-cycle target, the code attempt was withdrawn. Two pieces
+of that work are NOT dead and have their re-entry points named here so the
+next-cycle implementation doesn't lose them:
+
+### 1. `heel_f64x8::{l1, l2, linf}_f64_simd` → PR-X10 A6 `linalg::distance`
+
+The distance kernels themselves are correct; the framing was wrong (filed as
+"Sprint 0a of a four-repo integration arc" with cross-repo coupling that
+made the rollback unavoidable). The same code re-emerges as
+`ndarray::hpc::linalg::distance::{l1, l2, linf}_f64_simd` under worker A6
+in PR-X10 — the polar.rs / matfn.rs neighbourhood. Bench parity vs the
+PR #160 implementation is part of A6's acceptance gate. See
+`pr-x10-linalg-core-design.md` § "Distance kernels — `linalg::distance`".
+
+### 2. `lance-graph-contract::{ir, provider, actor}` → mostly redundant, except…
+
+The IR / provider types (`Operator`, `Cardinality`, `EngineHint`,
+`MvccProvider`) duplicate work the HHTL arc covers natively — they don't
+re-emerge. They're correctly dead.
+
+**Exception: `SupervisableShader` + `RestartBackoff`** have a future as
+*mailbox-cycle commitment-gate primitives* on Ractor actors. **Important
+framing refinement** (2026-05-19, post-rollback session): with the Rubicon
+model, the mailbox itself carries the commitment responsibility, so the
+gate fires *per-message-commit-cycle*, not at a physical zone-1↔zone-2
+boundary. The earlier framing ("they only fire at zone 1↔2 transitions")
+was a category error — zones 1/2/3 are *logical* stratification of where
+state physically lives, not perimeter walls actors cross. There is no
+physical boundary because the mailbox IS the Rubicon.
+
+What this means concretely for the next-cycle implementation, **under the
+column-substrate-identity framing** (see § "Column-substrate identity"
+above):
+
+- `SupervisableShader` is the supervisor-aware wrapper around a Ractor
+  handler that owns a *column-flip cycle* (read column → compute → flip
+  state-flag → reply / drop). Its "supervision boundary" is the
+  flip-cycle, not a perimeter between stores — because there is no second
+  store. SurrealDB / sea-orm / Databend / Tantivy are dialect surfaces on
+  the same Lance column the handler is operating on.
+- `RestartBackoff` governs how the supervisor responds when a flip-cycle
+  panics or returns an error before the flag is set. It gates *retry
+  attempts on the same column flip*, not retries across physical
+  infrastructure. The Lance version chain provides the natural retry
+  semantics (the flip either landed-and-committed in the version chain or
+  it didn't; SurrealDB LIVE watchers only see committed flips).
+- Both primitives are stateless types that live in `lance-graph` (the
+  thinking layer); they don't belong in ndarray (the hardware layer).
+- Re-entry point: a future PR-X14 or sibling sprint in the lance-graph
+  repo that introduces `LanceActor`-shaped wrappers for the canary's
+  commit path (see `hhtl-canary-inhabitance-plan.md` step 9 — the natural
+  first consumer is the NARS-revision handler that flips a `revised: false →
+  true` column on the belief row).
+
+The "no physical boundary 1/2/3 — and no second store either" insight is
+captured as the **fourth click-moment** in the Click-moments inventory
+above. Click-moments 1–3 were workload-shape dissolutions; #4 is the
+substrate-identity dissolution and is the deepest of the four. The
+SupervisableShader + RestartBackoff primitives can be small (~50 LoC each)
+because they encode column-flip-cycle semantics, not cross-store
+plumbing.
+
+### Lesson for future cross-repo arcs
+
+PR #404's failure mode was not bad code — it was a four-repo coupling
+filed as a single arc, which made the rollback inherently cross-repo and
+the merge-window inherently fragile. The architecturally-equivalent work
+re-enters as multiple single-repo PRs across the Phase 2 schedule
+(PR-X10 absorbs the distance kernels; a future PR-X14 absorbs the
+column-flip-cycle primitives). The next-cycle architectural target — the
+four-repo integration demo — happens *after* the canary lands in W8,
+not before. Integration depends on substrate, not vice versa. And under
+the column-substrate-identity framing, "integration" mostly means "wire
+the dialect surfaces to read the columns the canary writes" — there is
+no marshal layer to build.
+
 ## References
 
 - `pr-master-consolidation.md` — sprint plan, 10-submodule layout
@@ -253,6 +538,10 @@ Honesty roster — things that genuinely don't fit and need separate stories:
 - `pr-x11-jc-consolidation-design.md` — numerical certification (cascade ops)
 - `pr-x12-codec-x265-design.md` — compressed leaf storage
 - `pr-x13-ogit-bridge-design.md` — OGIT TTL bundle (ontology grounding)
-- `bardioc-weekend-rebuild-prompt.md` — migration baseline prompt
+- `bardioc-weekend-rebuild-prompt.md` — migration baseline prompt (build the old stack honest)
+- `ndarray-simd-trojan-horse-prompt.md` — inject ndarray::simd into ClickHouse + Tantivy (path A)
+- `databend-ndarray-simd-prompt.md` — adopt Databend + ndarray::simd as ClickHouse successor (path C, recommended)
+- `hhtl-canary-inhabitance-plan.md` — Phase 2 entry condition: names the NARS-revision canary + correctness/performance/inhabitance gates
+- `hhtl-substrate-execution-prompt.md` — Phase 2 Protocol A execution prompt (8 weeks, 6 sprints, 44 workers; per-sprint kickoff blocks for W1-W8)
 - `.claude/rules/data-flow.md` — Rule #3 source
-- lance-graph PR #404 — four-repo demo (architectural target)
+- lance-graph PR #404 — four-repo demo (architectural target; merge reverted via PR #405 in 2026-05-19 cross-repo rollback — intent preserved as next-cycle target, code attempt withdrawn)