Skip to content

[flink] union read from lake with startup timestamp filter#3236

Open
zuston wants to merge 1 commit intoapache:mainfrom
zuston:timestampLake
Open

[flink] union read from lake with startup timestamp filter#3236
zuston wants to merge 1 commit intoapache:mainfrom
zuston:timestampLake

Conversation

@zuston
Copy link
Copy Markdown
Member

@zuston zuston commented Apr 30, 2026

Purpose

Leveraging the union read mechanism, we can achieve low disk overhead while streaming long-term data from the data lake. However, when using the DataStream API as described in the documentation, I observed that lake splits are not generated when consuming with a timestamp-based offset.

This PR addresses that limitation by enabling timestamp filter pushdown to the lake layer. Currently, this capability is only supported for log tables. (It is somewhat unintuitive to use timestamp-based consumption for PK tables.)

Brief change log

  1. Enable timestamp filter pushdown in the lake source layer
  2. Define clear split boundaries between LakeSplit and LogSplit by using the offset

Tests

API and Format

Documentation

@zuston zuston marked this pull request as draft April 30, 2026 04:02
@zuston zuston marked this pull request as ready for review April 30, 2026 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant