GitHub - merutable/merutable: Embedded single-table engine in rust, where the data is both row and columnar and metadata is Iceberg-compatible. Write rows via KV, that can be queried via SQL from DuckDB/Spark/Trino/Snowflake/SFDataCloud — zero ETL.

An embeddable Rust table engine. LSM writes, Parquet storage, Iceberg-compatible metadata.

The writes go through a WAL + skip-list memtable; flushes land as Apache Parquet based SSTables. Invoke db.export_iceberg(path) when you need an Iceberg v2 view — DuckDB, Spark, Trino, Snowflake, and pyiceberg read it with no format conversion.

use merutable::{MeruDB, OpenOptions};
use merutable::schema::{ColumnDef, ColumnType, TableSchema};
use merutable::value::{FieldValue, Row};

#[tokio::main]
async fn main() -> merutable::error::Result<()> {
    let schema = TableSchema {
        table_name: "events".into(),
        columns: vec![
            ColumnDef { name: "id".into(),      col_type: ColumnType::Int64,     nullable: false, ..Default::default() },
            ColumnDef { name: "payload".into(), col_type: ColumnType::ByteArray, nullable: true,  ..Default::default() },
        ],
        primary_key: vec![0],
        ..Default::default()
    };

    let db = MeruDB::open(OpenOptions::new(schema)).await?;

    db.put(Row::new(vec![
        Some(FieldValue::Int64(1)),
        Some(FieldValue::Bytes(b"hello"[..].into())),
    ])).await?;

    let row = db.get(&[FieldValue::Int64(1)])?;
    println!("{row:?}");

    db.close().await?;   // flush + fsync + seal; reads remain until drop
    Ok(())
}

When merutable fits

Structured data thats both write-heavy - agent memory, session state, audit logs, feature stores, embedded time-series - and readable by analytical engines without an ETL job. An LSM gives you the fast-writes; Iceberg compatible metadata layer gives you the analytics reads.

What's in the box

Durable LSM write path. Write-ahead log with 32 KiB block framing and CRC32, crossbeam skip-list memtable, graduated writer backpressure on L0-file buildup. visible_seq advances only after the memtable apply, so readers never observe a torn write.
Leveled compaction. Full-rewrite, run in parallel on disjoint level sets, bounded per-job memory, fsync-before-commit, version-pinned GC so a long scan never sees a file disappear mid-read.
Iceberg export on demand. db.export_iceberg(path) writes a spec-clean Iceberg v2 chain — metadata.json + manifest-list Avro + manifest Avro — that DuckDB iceberg_scan, pyiceberg, Spark, Trino, and Athena consume as-is. You call export_iceberg when you want the view. merutable's metadata layer efficiency is not bound by the Iceberg spec.
Change feed. Committed operations are exposed as a change feed table provider with seq > N predicate pushdown and per-DELETE pre-image reconstruction.
Read-only replica (opt-in). Base + tail replayed from the change feed; rebase hot-swaps behind ArcSwap so in-flight readers never see a torn state.
Schema evolution. db.add_column(ColumnDef) — reopen accepts the extension, reads of pre-evolution files fill defaults, writes pad short rows with write_default.
Python bindings (via PyO3). crates/merutable-python/.

Install

[dependencies]
merutable = "0.0.1"

Architecture at a glance

          ┌──────── your process ────────┐
writes ──▶│ WAL → memtable → flush → SST │
reads  ◀──│   memtable  ∪  L0  ∪  L1…    │
          └─────────────┬────────────────┘
                        │  Parquet files on disk
                        ▼
              db.export_iceberg(path)
                        │
                        ▼
           DuckDB / Spark / Trino / pyiceberg

Deeper reads: docs/architecture.svg · docs/SEMANTICS.md · docs/EXTERNAL_READS.md · docs/MIRROR.md · docs/SCALE_OUT_REPLICA.md · docs/TAXONOMY.md · DEVELOPER.md

Lab notebook

lab/lab_merutable.ipynb — a live, runnable showcase comparing merutable against DuckDB head-to-head, then demonstrating the zero-ETL federated read (fresh memtable rows inside merutable, columnar analytical reads from DuckDB against the same on-disk Parquet).

cd lab && bash setup.sh

Status

Area	0.0.1
Storage format	LSM tree layout optimized for both row and columnar. Iceberg v2-compatible.
Durability	fsync on SST write, fsync on WAL, fsync on manifest commit.
Concurrency	Designed for one primary writer per catalog (not yet lock-enforced); many concurrent readers via version pinning.

Named after Mount Meru — the axis around which the cosmos is ordered in Indian cosmology.

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
.github/workflows		.github/workflows
crates		crates
docs		docs
lab		lab
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DEVELOPER.md		DEVELOPER.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

When merutable fits

What's in the box

Install

Architecture at a glance

Lab notebook

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

When merutable fits

What's in the box

Install

Architecture at a glance

Lab notebook

Status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages