[flink] union read from lake with startup timestamp filter by zuston · Pull Request #3236 · apache/fluss

zuston · 2026-04-30T04:02:27Z

Purpose

Leveraging the union read mechanism, we can achieve low disk overhead while streaming long-term data from the data lake. However, when using the DataStream API as described in the documentation, I observed that lake splits are not generated when consuming with a timestamp-based offset.

This PR addresses that limitation by enabling timestamp filter pushdown to the lake layer. Currently, this capability is only supported for log tables. (It is somewhat unintuitive to use timestamp-based consumption for PK tables.)

Brief change log

Enable timestamp filter pushdown in the lake source layer
Define clear split boundaries between LakeSplit and LogSplit by using the offset

Tests

API and Format

Documentation

zuston marked this pull request as draft April 30, 2026 04:02

[flink] union read from lake with startup timestamp filter

e03645b

zuston force-pushed the timestampLake branch from d111131 to e03645b Compare April 30, 2026 06:47

zuston marked this pull request as ready for review April 30, 2026 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flink] union read from lake with startup timestamp filter#3236

[flink] union read from lake with startup timestamp filter#3236
zuston wants to merge 1 commit intoapache:mainfrom
zuston:timestampLake

zuston commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zuston commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zuston commented Apr 30, 2026 •

edited

Loading