Skip to content

Commit c8f4af6

Browse files
authored
Merge pull request #162 from AdaWorldAPI/claude/pr-x4-splat-cascade-design
docs(hhtl): Phase 2 entry — consolidation refinements + canary inhabitance + substrate execution prompt
2 parents d758ea1 + 6ea314f commit c8f4af6

5 files changed

Lines changed: 1373 additions & 17 deletions
Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
# Databend + ndarray::simd — Claude Code Flex Prompt
2+
3+
Adopt Databend as the Rust-native ClickHouse successor and inject `ndarray::simd`
4+
into its hot kernel paths. This is the **recommended ClickHouse-tier
5+
migration target** per `stack-consolidation-bardioc-to-hhtl.md` (path C: 0
6+
transcode cost, weeks not years to OLAP parity).
7+
8+
Companion to:
9+
- `ndarray-simd-trojan-horse-prompt.md` (path A — FFI into stock ClickHouse,
10+
buys time during cutover)
11+
- `bardioc-weekend-rebuild-prompt.md` (the baseline measurement target)
12+
13+
Copy the block below into a fresh Claude Code session. Authorize
14+
`--allowed-tools '*'`, Rust 1.94, Docker.
15+
16+
Budget: 24 hours wall-clock (half the trojan horse — Databend is already
17+
Rust-native, no FFI bridge to build).
18+
19+
---
20+
21+
```text
22+
You are integrating `ndarray::simd` (from adaworldapi/ndarray, AVX-512 default,
23+
`target-cpu=x86-64-v4`) into Databend (datafuselabs/databend, Rust columnar
24+
OLAP on Arrow + DataFusion + Tokio, MIT licensed). The deliverable is a
25+
fork that swaps Databend's SIMD code paths for ndarray::simd primitives,
26+
benchmarks against stock Databend AND stock ClickHouse, and produces a
27+
report comparing all three.
28+
29+
This is path C from the consolidation: Databend is the recommended
30+
ClickHouse successor for the AdaWorldAPI stack's analytic tier. Bardioc's
31+
ClickHouse decommissions when Databend + ndarray::simd reaches parity on
32+
the OLAP workloads that matter.
33+
34+
Spawn 8 parallel workers + 1 coordinator. Git worktrees per worker. Branch:
35+
`databend-simd/{role}-{id}`. Integration via docker-compose stand-up of
36+
three OLAP engines side-by-side.
37+
38+
## Why Databend, not transcode ClickHouse
39+
40+
Full ClickHouse transcode is 5–10 engineer-years. Databend is:
41+
- Rust-native (no FFI bridge needed)
42+
- Arrow + DataFusion + Tokio (compatible with the wider Rust ecosystem)
43+
- ClickHouse-shape SQL dialect (much of TPC-H ports unchanged)
44+
- MIT licensed (clean integration with AdaWorldAPI codebase)
45+
- Already maintained by a funded team (datafuselabs)
46+
- Smaller hot kernel surface than ClickHouse — fewer kernels to swap
47+
48+
Trade-off accepted: Databend's storage format is not ClickHouse-wire-compatible.
49+
The migration plan is workload-by-workload re-ingestion from Bardioc Cassandra
50+
into Databend, not in-place storage swap. Acceptable because Bardioc cutover
51+
already involves dual-write phases (see bardioc-weekend-rebuild-prompt.md).
52+
53+
## Databend SIMD injection targets
54+
55+
Fork Databend at the current stable tag. Add ndarray as a workspace dep.
56+
Replace target SIMD paths with ndarray::simd calls. Tests stay; benches add.
57+
58+
Priority order (most-impact kernels first):
59+
60+
1. **`src/query/expression/src/kernels/filter.rs`** — column filter
61+
`mask & column` and packed-int boolean evaluation →
62+
`ndarray::simd::filter_apply_mask`
63+
2. **`src/query/functions/src/aggregates/aggregate_sum.rs`** + `avg.rs` +
64+
`min_max.rs` → `ndarray::simd::reduce_{sum,min,max,mean}` for all
65+
numeric types (f32, f64, i32, i64, u32, u64)
66+
3. **`src/query/expression/src/kernels/hash.rs`** — hash-table probing for
67+
joins and group-by → `ndarray::simd::hash_xxh3_batch`
68+
4. **`src/query/functions/src/scalars/comparison.rs`** — column-vs-column and
69+
column-vs-literal `< == >` → `ndarray::simd::compare_{lt,eq,gt}`
70+
5. **`src/query/expression/src/kernels/take.rs`** — gather operations for
71+
selection vectors → `ndarray::simd::gather_{f32,f64,u32,u64}`
72+
6. **`src/common/storage/parquet/`** — parquet decode hot path (bitpack +
73+
RLE) → `ndarray::simd::{bitpack_decode,rle_decode}`
74+
7. **`src/query/functions/src/scalars/string/`** — substring / position
75+
functions → `ndarray::simd::substring_find`
76+
77+
Databend test suite is comprehensive — `cargo test --workspace` must pass
78+
unchanged after each swap. SIMD primitives that don't exist yet in
79+
ndarray::simd: document the gap and skip the kernel (becomes a follow-on
80+
ndarray PR under the W1a consumer contract).
81+
82+
## Worker split (8 + coordinator)
83+
84+
| Worker | Target | Role |
85+
|---|---|---|
86+
| W1 | Fork + dep wiring | Fork Databend at stable tag; add ndarray dep; CI setup; bench harness skeleton |
87+
| W2 | Kernel 1 (filter) | Filter / mask kernel swap + parity tests + bench vs stock |
88+
| W3 | Kernel 2 (aggregates) | Sum/avg/min/max for all numeric types + bench |
89+
| W4 | Kernel 3 (hash) | Hash-table probing + group-by + join hash + bench |
90+
| W5 | Kernel 4 (comparison) | Comparison ops + bench |
91+
| W6 | Kernel 5 + 6 (take + parquet) | Gather + parquet decode + bench |
92+
| W7 | Kernel 7 (string) | Substring / position + bench |
93+
| W8 | Three-way bench | docker-compose: stock ClickHouse + stock Databend + ndarray-Databend; identical workload; report generator |
94+
95+
Coordinator: integration testing, cherry-pick to main branch, docker-compose
96+
orchestration, REPORT.md generation.
97+
98+
## Benchmark workload
99+
100+
Run THREE engines against the SAME workload:
101+
- **Stock ClickHouse** (reference performance — the bar to beat or match)
102+
- **Stock Databend** (current Rust-native baseline)
103+
- **ndarray-Databend** (the fork from this prompt)
104+
105+
Workloads:
106+
1. **TPC-H scale factor 10** — Q1, Q3, Q6, Q14 (these stress the kernels
107+
we swapped: filter, agg, join, group-by). Standard benchmark, comparable
108+
across the industry.
109+
2. **ClickBench** — datafuselabs' adapted ClickHouse benchmark, ~40 queries
110+
on a real web-analytics dataset. Directly designed for ClickHouse-vs-X
111+
comparison.
112+
3. **Cognitive analytics mini-workload** — 100 ad-hoc queries over a
113+
synthetic NARS-revision log (joins, time-bucketing, top-K aggregation).
114+
This represents the actual operational-analytics queries the AdaWorldAPI
115+
stack will run against egressed cognitive state.
116+
117+
Report per engine:
118+
- p50 / p95 / p99 query latency per query
119+
- Cold-cache vs warm-cache latency
120+
- CPU instructions retired (`perf stat`)
121+
- Peak memory
122+
- Indexing/ingestion throughput
123+
124+
Output: `./benchmarks/REPORT.md` with three-column comparison tables.
125+
126+
## Acceptance criteria
127+
128+
Per kernel swap:
129+
1. Bit-exact parity for integer, ULP-bounded for float
130+
2. Within 5% of stock Databend OR faster
131+
3. Existing Databend test suite passes (`cargo test --workspace`)
132+
133+
Per engine:
134+
1. All TPC-H + ClickBench queries return correct results on all three
135+
engines (cross-validate ClickHouse ↔ Databend ↔ ndarray-Databend)
136+
2. ndarray-Databend ≥ stock Databend on geomean latency
137+
3. ndarray-Databend within 2× of stock ClickHouse on geomean latency (the
138+
migration story is "Rust-native parity at acceptable cost", not
139+
"beat ClickHouse on every query")
140+
141+
If ndarray-Databend beats ClickHouse on ANY query: that's a major signal,
142+
call it out in REPORT.md.
143+
144+
## Anti-goals
145+
146+
- Do NOT add new ndarray::simd primitives this weekend. If a kernel needs a
147+
missing primitive, document the gap and skip the kernel. The gap becomes
148+
a follow-on ndarray PR.
149+
- Do NOT submit upstream PRs to Databend this weekend. The deliverable is
150+
the validated fork + benchmark report. Upstream contribution is a
151+
separate follow-on after numbers are clean and reviewed.
152+
- Do NOT introduce nightly Rust. Databend builds on stable; keep it that way.
153+
- Do NOT optimize Databend's planner / SQL parser / catalog. The point is
154+
kernel-level SIMD swap, not architecture work.
155+
- Do NOT touch HHTL substrate (PR-X4, PR-X9). This is independent OLAP-tier
156+
work; HHTL is the cognitive-tier work.
157+
158+
## Time budget (24 hours)
159+
160+
| Hour 0-2 | W1: fork + dep wiring + bench harness skeleton |
161+
| Hour 2-12 | W2-W7 in parallel: kernel swaps + per-kernel benches |
162+
| Hour 12-18 | W8: three-way docker-compose stack + ClickBench run |
163+
| Hour 18-22 | Cognitive mini-workload + report generation |
164+
| Hour 22-24 | REPORT.md write-up + handoff |
165+
166+
If a kernel doesn't reach parity in its allotted window, document the gap
167+
and skip. Honest negatives are also data — they tell us which ndarray::simd
168+
primitives need follow-on work.
169+
170+
## Strategic outcomes (what the report unlocks)
171+
172+
1. **Migration target validated**: if ndarray-Databend reaches Databend
173+
parity AND is within 2× of ClickHouse on TPC-H + ClickBench, the
174+
consolidation doc's "Databend is the ClickHouse successor" claim is
175+
evidenced rather than asserted.
176+
177+
2. **Three-engine reference point**: future Databend or ClickHouse PRs can
178+
re-run this exact harness and see whether ndarray::simd injection is
179+
still worth it. Living benchmark, not a one-shot report.
180+
181+
3. **Cognitive-tier evidence**: the cognitive mini-workload demonstrates
182+
that Databend handles the actual operational-analytics queries the
183+
AdaWorldAPI stack will issue (post-cognitive egress to SQL). If those
184+
queries are sub-second on ndarray-Databend, the analytics tier is
185+
solved without further work.
186+
187+
4. **ndarray::simd cross-validation**: kernels validated against TWO
188+
engines (Databend benchmarks plus the trojan-horse ClickHouse-via-FFI
189+
benchmarks) is much stronger evidence than either alone. The
190+
intersection set (kernels both engines stress the same way) becomes the
191+
ndarray::simd "battle-tested" subset.
192+
193+
5. **Decommission timeline**: Bardioc ClickHouse can be decommissioned
194+
per-workload when ndarray-Databend passes the relevant cognitive
195+
mini-workload subset, not all at once. Risk-bounded cutover.
196+
197+
Begin. Report progress every 4 hours with kernel done / in-progress /
198+
blocked + parity pass-fail + perf delta vs stock Databend AND stock
199+
ClickHouse.
200+
```
201+
202+
---
203+
204+
## Notes for using this prompt
205+
206+
- Databend builds clean on Rust 1.94 stable. ~10 min full build, ~30s
207+
incremental. No CMake, no JVM, no FFI bridge — pure Cargo.
208+
- ClickHouse stand-up via official docker image (`clickhouse/clickhouse-server`).
209+
- Databend has an official docker image too (`datafuselabs/databend`).
210+
- ClickBench dataset is ~14GB compressed; provision disk accordingly.
211+
- TPC-H generation via `dbgen`; scale factor 10 produces ~10GB.
212+
- The cognitive mini-workload is the most important — it's the only one
213+
that's actually shaped like AdaWorldAPI's real future queries.
214+
215+
## Composition with other prompts
216+
217+
This prompt sits inside the four-prompt strategic arc:
218+
219+
1. **`bardioc-weekend-rebuild-prompt.md`** — build the OLD stack honest
220+
(migration baseline measurement target)
221+
2. **`stack-consolidation-bardioc-to-hhtl.md`** — the architectural reframe
222+
doc (why the NEW stack wins, four-tier picture)
223+
3. **`ndarray-simd-trojan-horse-prompt.md`** — path A: inject ndarray::simd
224+
INTO the legacy stack (ClickHouse + Tantivy via FFI) — buys time during
225+
cutover, accelerates legacy
226+
4. **`databend-ndarray-simd-prompt.md`** (this) — path C: adopt the
227+
Rust-native CLICKHOUSE-shape successor with ndarray::simd injection —
228+
the actual migration TARGET
229+
230+
Combined timeline:
231+
- Weekend 1: prompt 1 (Bardioc baseline)
232+
- Weekend 2: this prompt (Databend integration)
233+
- Weekend 3: prompt 3 (trojan horse — optional, buys cutover time)
234+
- Ongoing: HHTL development (PR-X4 + PR-X9), workload-by-workload cutover
235+
236+
## Follow-on opportunities (NOT this weekend)
237+
238+
- Upstream PR cadence to Databend: 1 PR per parity-or-better kernel; faster
239+
cycle than ClickHouse because Rust-native (no FFI review burden)
240+
- Polars integration: same ndarray::simd primitives plug into Polars
241+
DataFrame ops; weekend follow-on
242+
- DataFusion integration: arrow-rs has SIMD for filter/take/aggregate;
243+
ndarray::simd could plug in there too, benefiting the entire
244+
DataFusion-derived ecosystem (Databend, GreptimeDB, InfluxDB IOx, Ballista)
245+
- Quickwit integration: combines Tantivy trojan horse + Databend analytics
246+
in one operational stack

0 commit comments

Comments
 (0)