|
| 1 | +# Databend + ndarray::simd — Claude Code Flex Prompt |
| 2 | + |
| 3 | +Adopt Databend as the Rust-native ClickHouse successor and inject `ndarray::simd` |
| 4 | +into its hot kernel paths. This is the **recommended ClickHouse-tier |
| 5 | +migration target** per `stack-consolidation-bardioc-to-hhtl.md` (path C: 0 |
| 6 | +transcode cost, weeks not years to OLAP parity). |
| 7 | + |
| 8 | +Companion to: |
| 9 | +- `ndarray-simd-trojan-horse-prompt.md` (path A — FFI into stock ClickHouse, |
| 10 | + buys time during cutover) |
| 11 | +- `bardioc-weekend-rebuild-prompt.md` (the baseline measurement target) |
| 12 | + |
| 13 | +Copy the block below into a fresh Claude Code session. Authorize |
| 14 | +`--allowed-tools '*'`, Rust 1.94, Docker. |
| 15 | + |
| 16 | +Budget: 24 hours wall-clock (half the trojan horse — Databend is already |
| 17 | +Rust-native, no FFI bridge to build). |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +```text |
| 22 | +You are integrating `ndarray::simd` (from adaworldapi/ndarray, AVX-512 default, |
| 23 | +`target-cpu=x86-64-v4`) into Databend (datafuselabs/databend, Rust columnar |
| 24 | +OLAP on Arrow + DataFusion + Tokio, MIT licensed). The deliverable is a |
| 25 | +fork that swaps Databend's SIMD code paths for ndarray::simd primitives, |
| 26 | +benchmarks against stock Databend AND stock ClickHouse, and produces a |
| 27 | +report comparing all three. |
| 28 | +
|
| 29 | +This is path C from the consolidation: Databend is the recommended |
| 30 | +ClickHouse successor for the AdaWorldAPI stack's analytic tier. Bardioc's |
| 31 | +ClickHouse decommissions when Databend + ndarray::simd reaches parity on |
| 32 | +the OLAP workloads that matter. |
| 33 | +
|
| 34 | +Spawn 8 parallel workers + 1 coordinator. Git worktrees per worker. Branch: |
| 35 | +`databend-simd/{role}-{id}`. Integration via docker-compose stand-up of |
| 36 | +three OLAP engines side-by-side. |
| 37 | +
|
| 38 | +## Why Databend, not transcode ClickHouse |
| 39 | +
|
| 40 | +Full ClickHouse transcode is 5–10 engineer-years. Databend is: |
| 41 | +- Rust-native (no FFI bridge needed) |
| 42 | +- Arrow + DataFusion + Tokio (compatible with the wider Rust ecosystem) |
| 43 | +- ClickHouse-shape SQL dialect (much of TPC-H ports unchanged) |
| 44 | +- MIT licensed (clean integration with AdaWorldAPI codebase) |
| 45 | +- Already maintained by a funded team (datafuselabs) |
| 46 | +- Smaller hot kernel surface than ClickHouse — fewer kernels to swap |
| 47 | +
|
| 48 | +Trade-off accepted: Databend's storage format is not ClickHouse-wire-compatible. |
| 49 | +The migration plan is workload-by-workload re-ingestion from Bardioc Cassandra |
| 50 | +into Databend, not in-place storage swap. Acceptable because Bardioc cutover |
| 51 | +already involves dual-write phases (see bardioc-weekend-rebuild-prompt.md). |
| 52 | +
|
| 53 | +## Databend SIMD injection targets |
| 54 | +
|
| 55 | +Fork Databend at the current stable tag. Add ndarray as a workspace dep. |
| 56 | +Replace target SIMD paths with ndarray::simd calls. Tests stay; benches add. |
| 57 | +
|
| 58 | +Priority order (most-impact kernels first): |
| 59 | +
|
| 60 | +1. **`src/query/expression/src/kernels/filter.rs`** — column filter |
| 61 | + `mask & column` and packed-int boolean evaluation → |
| 62 | + `ndarray::simd::filter_apply_mask` |
| 63 | +2. **`src/query/functions/src/aggregates/aggregate_sum.rs`** + `avg.rs` + |
| 64 | + `min_max.rs` → `ndarray::simd::reduce_{sum,min,max,mean}` for all |
| 65 | + numeric types (f32, f64, i32, i64, u32, u64) |
| 66 | +3. **`src/query/expression/src/kernels/hash.rs`** — hash-table probing for |
| 67 | + joins and group-by → `ndarray::simd::hash_xxh3_batch` |
| 68 | +4. **`src/query/functions/src/scalars/comparison.rs`** — column-vs-column and |
| 69 | + column-vs-literal `< == >` → `ndarray::simd::compare_{lt,eq,gt}` |
| 70 | +5. **`src/query/expression/src/kernels/take.rs`** — gather operations for |
| 71 | + selection vectors → `ndarray::simd::gather_{f32,f64,u32,u64}` |
| 72 | +6. **`src/common/storage/parquet/`** — parquet decode hot path (bitpack + |
| 73 | + RLE) → `ndarray::simd::{bitpack_decode,rle_decode}` |
| 74 | +7. **`src/query/functions/src/scalars/string/`** — substring / position |
| 75 | + functions → `ndarray::simd::substring_find` |
| 76 | +
|
| 77 | +Databend test suite is comprehensive — `cargo test --workspace` must pass |
| 78 | +unchanged after each swap. SIMD primitives that don't exist yet in |
| 79 | +ndarray::simd: document the gap and skip the kernel (becomes a follow-on |
| 80 | +ndarray PR under the W1a consumer contract). |
| 81 | +
|
| 82 | +## Worker split (8 + coordinator) |
| 83 | +
|
| 84 | +| Worker | Target | Role | |
| 85 | +|---|---|---| |
| 86 | +| W1 | Fork + dep wiring | Fork Databend at stable tag; add ndarray dep; CI setup; bench harness skeleton | |
| 87 | +| W2 | Kernel 1 (filter) | Filter / mask kernel swap + parity tests + bench vs stock | |
| 88 | +| W3 | Kernel 2 (aggregates) | Sum/avg/min/max for all numeric types + bench | |
| 89 | +| W4 | Kernel 3 (hash) | Hash-table probing + group-by + join hash + bench | |
| 90 | +| W5 | Kernel 4 (comparison) | Comparison ops + bench | |
| 91 | +| W6 | Kernel 5 + 6 (take + parquet) | Gather + parquet decode + bench | |
| 92 | +| W7 | Kernel 7 (string) | Substring / position + bench | |
| 93 | +| W8 | Three-way bench | docker-compose: stock ClickHouse + stock Databend + ndarray-Databend; identical workload; report generator | |
| 94 | +
|
| 95 | +Coordinator: integration testing, cherry-pick to main branch, docker-compose |
| 96 | +orchestration, REPORT.md generation. |
| 97 | +
|
| 98 | +## Benchmark workload |
| 99 | +
|
| 100 | +Run THREE engines against the SAME workload: |
| 101 | +- **Stock ClickHouse** (reference performance — the bar to beat or match) |
| 102 | +- **Stock Databend** (current Rust-native baseline) |
| 103 | +- **ndarray-Databend** (the fork from this prompt) |
| 104 | +
|
| 105 | +Workloads: |
| 106 | +1. **TPC-H scale factor 10** — Q1, Q3, Q6, Q14 (these stress the kernels |
| 107 | + we swapped: filter, agg, join, group-by). Standard benchmark, comparable |
| 108 | + across the industry. |
| 109 | +2. **ClickBench** — datafuselabs' adapted ClickHouse benchmark, ~40 queries |
| 110 | + on a real web-analytics dataset. Directly designed for ClickHouse-vs-X |
| 111 | + comparison. |
| 112 | +3. **Cognitive analytics mini-workload** — 100 ad-hoc queries over a |
| 113 | + synthetic NARS-revision log (joins, time-bucketing, top-K aggregation). |
| 114 | + This represents the actual operational-analytics queries the AdaWorldAPI |
| 115 | + stack will run against egressed cognitive state. |
| 116 | +
|
| 117 | +Report per engine: |
| 118 | +- p50 / p95 / p99 query latency per query |
| 119 | +- Cold-cache vs warm-cache latency |
| 120 | +- CPU instructions retired (`perf stat`) |
| 121 | +- Peak memory |
| 122 | +- Indexing/ingestion throughput |
| 123 | +
|
| 124 | +Output: `./benchmarks/REPORT.md` with three-column comparison tables. |
| 125 | +
|
| 126 | +## Acceptance criteria |
| 127 | +
|
| 128 | +Per kernel swap: |
| 129 | +1. Bit-exact parity for integer, ULP-bounded for float |
| 130 | +2. Within 5% of stock Databend OR faster |
| 131 | +3. Existing Databend test suite passes (`cargo test --workspace`) |
| 132 | +
|
| 133 | +Per engine: |
| 134 | +1. All TPC-H + ClickBench queries return correct results on all three |
| 135 | + engines (cross-validate ClickHouse ↔ Databend ↔ ndarray-Databend) |
| 136 | +2. ndarray-Databend ≥ stock Databend on geomean latency |
| 137 | +3. ndarray-Databend within 2× of stock ClickHouse on geomean latency (the |
| 138 | + migration story is "Rust-native parity at acceptable cost", not |
| 139 | + "beat ClickHouse on every query") |
| 140 | +
|
| 141 | +If ndarray-Databend beats ClickHouse on ANY query: that's a major signal, |
| 142 | +call it out in REPORT.md. |
| 143 | +
|
| 144 | +## Anti-goals |
| 145 | +
|
| 146 | +- Do NOT add new ndarray::simd primitives this weekend. If a kernel needs a |
| 147 | + missing primitive, document the gap and skip the kernel. The gap becomes |
| 148 | + a follow-on ndarray PR. |
| 149 | +- Do NOT submit upstream PRs to Databend this weekend. The deliverable is |
| 150 | + the validated fork + benchmark report. Upstream contribution is a |
| 151 | + separate follow-on after numbers are clean and reviewed. |
| 152 | +- Do NOT introduce nightly Rust. Databend builds on stable; keep it that way. |
| 153 | +- Do NOT optimize Databend's planner / SQL parser / catalog. The point is |
| 154 | + kernel-level SIMD swap, not architecture work. |
| 155 | +- Do NOT touch HHTL substrate (PR-X4, PR-X9). This is independent OLAP-tier |
| 156 | + work; HHTL is the cognitive-tier work. |
| 157 | +
|
| 158 | +## Time budget (24 hours) |
| 159 | +
|
| 160 | +| Hour 0-2 | W1: fork + dep wiring + bench harness skeleton | |
| 161 | +| Hour 2-12 | W2-W7 in parallel: kernel swaps + per-kernel benches | |
| 162 | +| Hour 12-18 | W8: three-way docker-compose stack + ClickBench run | |
| 163 | +| Hour 18-22 | Cognitive mini-workload + report generation | |
| 164 | +| Hour 22-24 | REPORT.md write-up + handoff | |
| 165 | +
|
| 166 | +If a kernel doesn't reach parity in its allotted window, document the gap |
| 167 | +and skip. Honest negatives are also data — they tell us which ndarray::simd |
| 168 | +primitives need follow-on work. |
| 169 | +
|
| 170 | +## Strategic outcomes (what the report unlocks) |
| 171 | +
|
| 172 | +1. **Migration target validated**: if ndarray-Databend reaches Databend |
| 173 | + parity AND is within 2× of ClickHouse on TPC-H + ClickBench, the |
| 174 | + consolidation doc's "Databend is the ClickHouse successor" claim is |
| 175 | + evidenced rather than asserted. |
| 176 | +
|
| 177 | +2. **Three-engine reference point**: future Databend or ClickHouse PRs can |
| 178 | + re-run this exact harness and see whether ndarray::simd injection is |
| 179 | + still worth it. Living benchmark, not a one-shot report. |
| 180 | +
|
| 181 | +3. **Cognitive-tier evidence**: the cognitive mini-workload demonstrates |
| 182 | + that Databend handles the actual operational-analytics queries the |
| 183 | + AdaWorldAPI stack will issue (post-cognitive egress to SQL). If those |
| 184 | + queries are sub-second on ndarray-Databend, the analytics tier is |
| 185 | + solved without further work. |
| 186 | +
|
| 187 | +4. **ndarray::simd cross-validation**: kernels validated against TWO |
| 188 | + engines (Databend benchmarks plus the trojan-horse ClickHouse-via-FFI |
| 189 | + benchmarks) is much stronger evidence than either alone. The |
| 190 | + intersection set (kernels both engines stress the same way) becomes the |
| 191 | + ndarray::simd "battle-tested" subset. |
| 192 | +
|
| 193 | +5. **Decommission timeline**: Bardioc ClickHouse can be decommissioned |
| 194 | + per-workload when ndarray-Databend passes the relevant cognitive |
| 195 | + mini-workload subset, not all at once. Risk-bounded cutover. |
| 196 | +
|
| 197 | +Begin. Report progress every 4 hours with kernel done / in-progress / |
| 198 | +blocked + parity pass-fail + perf delta vs stock Databend AND stock |
| 199 | +ClickHouse. |
| 200 | +``` |
| 201 | + |
| 202 | +--- |
| 203 | + |
| 204 | +## Notes for using this prompt |
| 205 | + |
| 206 | +- Databend builds clean on Rust 1.94 stable. ~10 min full build, ~30s |
| 207 | + incremental. No CMake, no JVM, no FFI bridge — pure Cargo. |
| 208 | +- ClickHouse stand-up via official docker image (`clickhouse/clickhouse-server`). |
| 209 | +- Databend has an official docker image too (`datafuselabs/databend`). |
| 210 | +- ClickBench dataset is ~14GB compressed; provision disk accordingly. |
| 211 | +- TPC-H generation via `dbgen`; scale factor 10 produces ~10GB. |
| 212 | +- The cognitive mini-workload is the most important — it's the only one |
| 213 | + that's actually shaped like AdaWorldAPI's real future queries. |
| 214 | + |
| 215 | +## Composition with other prompts |
| 216 | + |
| 217 | +This prompt sits inside the four-prompt strategic arc: |
| 218 | + |
| 219 | +1. **`bardioc-weekend-rebuild-prompt.md`** — build the OLD stack honest |
| 220 | + (migration baseline measurement target) |
| 221 | +2. **`stack-consolidation-bardioc-to-hhtl.md`** — the architectural reframe |
| 222 | + doc (why the NEW stack wins, four-tier picture) |
| 223 | +3. **`ndarray-simd-trojan-horse-prompt.md`** — path A: inject ndarray::simd |
| 224 | + INTO the legacy stack (ClickHouse + Tantivy via FFI) — buys time during |
| 225 | + cutover, accelerates legacy |
| 226 | +4. **`databend-ndarray-simd-prompt.md`** (this) — path C: adopt the |
| 227 | + Rust-native CLICKHOUSE-shape successor with ndarray::simd injection — |
| 228 | + the actual migration TARGET |
| 229 | + |
| 230 | +Combined timeline: |
| 231 | +- Weekend 1: prompt 1 (Bardioc baseline) |
| 232 | +- Weekend 2: this prompt (Databend integration) |
| 233 | +- Weekend 3: prompt 3 (trojan horse — optional, buys cutover time) |
| 234 | +- Ongoing: HHTL development (PR-X4 + PR-X9), workload-by-workload cutover |
| 235 | + |
| 236 | +## Follow-on opportunities (NOT this weekend) |
| 237 | + |
| 238 | +- Upstream PR cadence to Databend: 1 PR per parity-or-better kernel; faster |
| 239 | + cycle than ClickHouse because Rust-native (no FFI review burden) |
| 240 | +- Polars integration: same ndarray::simd primitives plug into Polars |
| 241 | + DataFrame ops; weekend follow-on |
| 242 | +- DataFusion integration: arrow-rs has SIMD for filter/take/aggregate; |
| 243 | + ndarray::simd could plug in there too, benefiting the entire |
| 244 | + DataFusion-derived ecosystem (Databend, GreptimeDB, InfluxDB IOx, Ballista) |
| 245 | +- Quickwit integration: combines Tantivy trojan horse + Databend analytics |
| 246 | + in one operational stack |
0 commit comments