[DISCUSS] Split the current `iceberg` library into `iceberg-core` and `iceberg-data`

I would like to propose splitting the current `iceberg` library into two lower-level targets:

- `iceberg-core`
- `iceberg-data`

## Motivation

Today the `iceberg` library appears to include both:

1. **metadata / planning / model-layer functionality**
   - schema, types, partition spec, sort order
   - table metadata, snapshots, transactions, updates
   - manifests
   - expressions
   - catalog abstractions and in-memory catalog
   - general utilities and file abstraction APIs

2. **data access / execution-layer functionality**
   - data file readers and writers
   - delete file readers and writers
   - delete loading and filtering
   - merge-on-read execution
   - Puffin reader/writer support

These are conceptually different layers, and separating them would make the project structure clearer and easier to evolve.

In particular, the data path is likely to grow independently over time as support for more read/write behaviors, delete handling, Puffin, and execution-oriented features expands. Splitting it out would help reduce the conceptual scope of the core library and make target responsibilities more explicit.

## Proposed direction

### `iceberg-core`

This target would contain the metadata/model/planning layer, including things such as:

- schema / type / partition / sort / transform
- table / snapshot / metadata / requirements / updates / transactions
- manifest handling
- expressions
- catalog abstractions and memory catalog
- generic utilities
- file format declarations and file I/O abstractions
- possibly the abstract reader/writer interfaces, depending on the final boundary decision

### `iceberg-data`

This target would contain data-file-oriented logic, including:

- data writer and reader logic
- delete file writer and reader logic
- `DeleteLoader`
- delete filter logic
- merge-on-read reader/execution logic
- Puffin reader and writer support
- supporting delete/data execution structures that are primarily used by these paths

## Compatibility

This is a breaking change. We could keep `iceberg` as an aggregate/umbrella compatibility target that links both but it seems not a wise decision at this moment.

## Why this seems feasible

The repository already has some useful structure that suggests this split is practical:

- the build already distinguishes `iceberg`, `iceberg-bundle`, and `iceberg-rest`
- source layout already separates areas like `data/`, `deletes/`, `puffin/`, `manifest/`, `expression/`, `update/`, etc.
- there are already format-agnostic reader/writer abstractions and factory registration points, which should help define a stable boundary

So this looks less like a brand new architecture and more like making an existing separation more explicit at the build and module level.

## Main design question

The main point that likely needs discussion is the precise boundary between planning/core and execution/data.

In particular:

- should the abstract `file_reader` / `file_writer` interfaces stay in `iceberg-core`, with `iceberg-data` building on top of them?
- or should all reader/writer-related APIs move to `iceberg-data`?

My initial preference is to keep the abstract, format-agnostic interfaces in `iceberg-core`, and move the higher-level data/delete/Puffin/MOR logic into `iceberg-data`. That seems like the cleanest layering, but I would be interested in feedback.

If this direction sounds reasonable, I’d be happy to work on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSS] Split the current `iceberg` library into `iceberg-core` and `iceberg-data` #627

Motivation

Proposed direction

`iceberg-core`

`iceberg-data`

Compatibility

Why this seems feasible

Main design question

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[DISCUSS] Split the current iceberg library into iceberg-core and iceberg-data #627

Description

Motivation

Proposed direction

iceberg-core

iceberg-data

Compatibility

Why this seems feasible

Main design question

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[DISCUSS] Split the current `iceberg` library into `iceberg-core` and `iceberg-data` #627

`iceberg-core`

`iceberg-data`