An embeddable Rust table engine. LSM writes, Parquet storage, Iceberg-compatible metadata.
The writes go through a WAL + skip-list memtable; flushes land as
Apache Parquet based SSTables. Invoke db.export_iceberg(path) when you need an
Iceberg v2 view — DuckDB, Spark, Trino, Snowflake, and pyiceberg read it with
no format conversion.
use merutable::{MeruDB, OpenOptions};
use merutable::schema::{ColumnDef, ColumnType, TableSchema};
use merutable::value::{FieldValue, Row};
#[tokio::main]
async fn main() -> merutable::error::Result<()> {
let schema = TableSchema {
table_name: "events".into(),
columns: vec![
ColumnDef { name: "id".into(), col_type: ColumnType::Int64, nullable: false, ..Default::default() },
ColumnDef { name: "payload".into(), col_type: ColumnType::ByteArray, nullable: true, ..Default::default() },
],
primary_key: vec![0],
..Default::default()
};
let db = MeruDB::open(OpenOptions::new(schema)).await?;
db.put(Row::new(vec![
Some(FieldValue::Int64(1)),
Some(FieldValue::Bytes(b"hello"[..].into())),
])).await?;
let row = db.get(&[FieldValue::Int64(1)])?;
println!("{row:?}");
db.close().await?; // flush + fsync + seal; reads remain until drop
Ok(())
}Structured data thats both write-heavy - agent memory, session state, audit logs, feature stores, embedded time-series - and readable by analytical engines without an ETL job. An LSM gives you the fast-writes; Iceberg compatible metadata layer gives you the analytics reads.
- Durable LSM write path. Write-ahead log with 32 KiB block framing and
CRC32, crossbeam skip-list memtable, graduated writer backpressure on
L0-file buildup.
visible_seqadvances only after the memtable apply, so readers never observe a torn write. - Leveled compaction. Full-rewrite, run in parallel on disjoint level sets, bounded per-job memory, fsync-before-commit, version-pinned GC so a long scan never sees a file disappear mid-read.
- Iceberg export on demand.
db.export_iceberg(path)writes a spec-clean Iceberg v2 chain —metadata.json+ manifest-list Avro + manifest Avro — that DuckDBiceberg_scan, pyiceberg, Spark, Trino, and Athena consume as-is. You callexport_icebergwhen you want the view.merutable's metadata layer efficiency is not bound by the Iceberg spec. - Change feed. Committed operations are exposed as a change feed table
provider with
seq > Npredicate pushdown and per-DELETE pre-image reconstruction. - Read-only replica (opt-in). Base + tail replayed from the change
feed; rebase hot-swaps behind
ArcSwapso in-flight readers never see a torn state. - Schema evolution.
db.add_column(ColumnDef)— reopen accepts the extension, reads of pre-evolution files fill defaults, writes pad short rows withwrite_default. - Python bindings (via PyO3).
crates/merutable-python/.
[dependencies]
merutable = "0.0.1" ┌──────── your process ────────┐
writes ──▶│ WAL → memtable → flush → SST │
reads ◀──│ memtable ∪ L0 ∪ L1… │
└─────────────┬────────────────┘
│ Parquet files on disk
▼
db.export_iceberg(path)
│
▼
DuckDB / Spark / Trino / pyiceberg
Deeper reads:
docs/architecture.svg ·
docs/SEMANTICS.md ·
docs/EXTERNAL_READS.md ·
docs/MIRROR.md ·
docs/SCALE_OUT_REPLICA.md ·
docs/TAXONOMY.md ·
DEVELOPER.md
lab/lab_merutable.ipynb — a live, runnable
showcase comparing merutable against DuckDB head-to-head, then demonstrating
the zero-ETL federated read (fresh memtable rows inside merutable, columnar
analytical reads from DuckDB against the same on-disk Parquet).
cd lab && bash setup.sh| Area | 0.0.1 |
|---|---|
| Storage format | LSM tree layout optimized for both row and columnar. Iceberg v2-compatible. |
| Durability | fsync on SST write, fsync on WAL, fsync on manifest commit. |
| Concurrency | Designed for one primary writer per catalog (not yet lock-enforced); many concurrent readers via version pinning. |
Named after Mount Meru — the axis around which the cosmos is ordered in Indian cosmology.