[server] Support configurable time partition format for auto-partitioned tables#3200
[server] Support configurable time partition format for auto-partitioned tables#3200wattt3 wants to merge 2 commits intoapache:mainfrom
Conversation
|
@luoyuxia PTAL, thanks. |
There was a problem hiding this comment.
Pull request overview
Adds a new table option to customize the string format used for time-based auto-partition values, propagating that option through partition generation/retention logic and documenting the new behavior.
Changes:
- Introduce
table.auto-partition.time-format(no default; derived fromtime-unitwhen unset) and wire it into auto-partition creation and retention. - Validate custom time format pattern syntax during table descriptor validation, with new unit tests.
- Update docs and existing tests/call sites for updated partition utility APIs.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| website/docs/table-design/data-distribution/partitioning.md | Documents the new table.auto-partition.time-format option for auto-partitioned tables. |
| website/docs/engine-flink/options.md | Exposes the new option in Flink engine table options documentation. |
| fluss-server/src/test/java/org/apache/fluss/server/utils/TableDescriptorValidationTest.java | Adds coverage for accepting/rejecting custom time-format values at table creation validation time. |
| fluss-server/src/test/java/org/apache/fluss/server/coordinator/TableManagerITCase.java | Updates test helper call site for new generateAutoPartition(...) signature. |
| fluss-server/src/main/java/org/apache/fluss/server/utils/TableDescriptorValidation.java | Adds table-create-time validation for time-format pattern syntax. |
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/AutoPartitionManager.java | Passes time-format through to partition pre-creation and retention cutoff calculation. |
| fluss-common/src/test/java/org/apache/fluss/utils/PartitionUtilsTest.java | Adds tests for partition name generation with a custom time format. |
| fluss-common/src/main/java/org/apache/fluss/utils/PartitionUtils.java | Extends partition time generation/validation to accept an optional custom time-format and uses Locale.ROOT. |
| fluss-common/src/main/java/org/apache/fluss/utils/AutoPartitionStrategy.java | Adds timeFormat to the resolved auto-partition strategy from table options. |
| fluss-common/src/main/java/org/apache/fluss/config/ConfigOptions.java | Defines the new table.auto-partition.time-format config option and its description. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| String timeFormat = autoPartition.timeFormat(); | ||
| if (timeFormat != null) { | ||
| if (timeFormat.trim().isEmpty()) { | ||
| throw new InvalidConfigException( | ||
| String.format( | ||
| "'%s' must not be empty.", | ||
| ConfigOptions.TABLE_AUTO_PARTITION_TIME_FORMAT.key())); | ||
| } | ||
| try { | ||
| DateTimeFormatter.ofPattern(timeFormat, Locale.ROOT); | ||
| } catch (IllegalArgumentException e) { | ||
| throw new InvalidConfigException( | ||
| String.format( | ||
| "Invalid time format '%s' for '%s': %s", | ||
| timeFormat, | ||
| ConfigOptions.TABLE_AUTO_PARTITION_TIME_FORMAT.key(), | ||
| e.getMessage())); | ||
| } |
There was a problem hiding this comment.
table.auto-partition.time-format is only validated for DateTimeFormatter pattern syntax. It can still generate partition values that are invalid for Fluss/ZooKeeper paths (e.g., /, ., spaces) or collide for a given time-unit (e.g., time-unit=HOUR with format yyyy-MM-dd produces identical values for different hours). This will lead to auto-partition creation failures or incorrect retention behavior at runtime. Consider validating at table creation that a sample formatted value passes the same partition-value rules (TablePath.detectInvalidName/validatePrefix) and that formatting differs between now and now + 1 <time-unit> (and ideally preserves lexicographic order).
| | table.auto-partition.enabled | Boolean | no | false | Whether enable auto partition for the table. Disable by default. When auto partition is enabled, the partitions of the table will be created automatically. | | ||
| | table.auto-partition.key | String | no | (none) | This configuration defines the time-based partition key to be used for auto-partitioning when a table is partitioned with multiple keys. Auto-partitioning utilizes a time-based partition key to handle partitions automatically, including creating new ones and removing outdated ones, by comparing the time value of the partition with the current system time. In the case of a table using multiple partition keys (such as a composite partitioning strategy), this feature determines which key should serve as the primary time dimension for making auto-partitioning decisions. And If the table has only one partition key, this config is not necessary. Otherwise, it must be specified. | | ||
| | table.auto-partition.time-unit | ENUM | no | DAY | The time granularity for auto created partitions. The default value is 'DAY'. Valid values are 'HOUR', 'DAY', 'MONTH', 'QUARTER', 'YEAR'. If the value is 'HOUR', the partition format for auto created is yyyyMMddHH. If the value is 'DAY', the partition format for auto created is yyyyMMdd. If the value is 'MONTH', the partition format for auto created is yyyyMM. If the value is 'QUARTER', the partition format for auto created is yyyyQ. If the value is 'YEAR', the partition format for auto created is yyyy. | | ||
| | table.auto-partition.time-format | String | no | (derived from unit) | The time format used for auto-created partition values. If not set, the format is derived from `table.auto-partition.time-unit` (e.g. `yyyyMMdd` for DAY). When set, this value overrides the format derived from the time unit, while the partition granularity still follows `table.auto-partition.time-unit`. A custom format must use zero-padded numeric fields covering at least the unit's precision so that partition values sort by time as strings (e.g. `yyyy-MM-dd` for DAY). | |
There was a problem hiding this comment.
The table.auto-partition.time-format docs don't mention the character constraints for partition values (Fluss only allows ASCII alphanumerics, _, and -). Formats like yyyy/MM/dd or those producing spaces will be accepted by the formatter but will create invalid partition names (and may fail when creating ZooKeeper nodes). Consider documenting the allowed output character set (and/or explicitly warning against /, ., spaces, etc.).
| | table.auto-partition.time-format | String | no | (derived from unit) | The time format used for auto-created partition values. If not set, the format is derived from `table.auto-partition.time-unit` (e.g. `yyyyMMdd` for DAY). When set, this value overrides the format derived from the time unit, while the partition granularity still follows `table.auto-partition.time-unit`. A custom format must use zero-padded numeric fields covering at least the unit's precision so that partition values sort by time as strings (e.g. `yyyy-MM-dd` for DAY). | | |
| | table.auto-partition.time-format | String | no | (derived from unit) | The time format used for auto-created partition values. If not set, the format is derived from `table.auto-partition.time-unit` (e.g. `yyyyMMdd` for DAY). When set, this value overrides the format derived from the time unit, while the partition granularity still follows `table.auto-partition.time-unit`. A custom format must use zero-padded numeric fields covering at least the unit's precision so that partition values sort by time as strings (e.g. `yyyy-MM-dd` for DAY). The formatted partition value must contain only ASCII letters and digits, `_`, or `-`. Do not use formats that produce `/`, `.`, spaces, `:`, or other characters outside this set, because the formatter may accept them but Fluss partition names do not. | |
| | table.auto-partition.enabled | Boolean | false | Whether enable auto partition for the table. Disable by default. When auto partition is enabled, the partitions of the table will be created automatically. | | ||
| | table.auto-partition.key | String | (None) | This configuration defines the time-based partition key to be used for auto-partitioning when a table is partitioned with multiple keys. Auto-partitioning utilizes a time-based partition key to handle partitions automatically, including creating new ones and removing outdated ones, by comparing the time value of the partition with the current system time. In the case of a table using multiple partition keys (such as a composite partitioning strategy), this feature determines which key should serve as the primary time dimension for making auto-partitioning decisions. And If the table has only one partition key, this config is not necessary. Otherwise, it must be specified. | | ||
| | table.auto-partition.time-unit | ENUM | DAY | The time granularity for auto created partitions. The default value is `DAY`. Valid values are `HOUR`, `DAY`, `MONTH`, `QUARTER`, `YEAR`. If the value is `HOUR`, the partition format for auto created is yyyyMMddHH. If the value is `DAY`, the partition format for auto created is yyyyMMdd. If the value is `MONTH`, the partition format for auto created is yyyyMM. If the value is `QUARTER`, the partition format for auto created is yyyyQ. If the value is `YEAR`, the partition format for auto created is yyyy. | | ||
| | table.auto-partition.time-format | String | (derived from unit) | The time format used for auto-created partition values. If not set, the format is derived from `table.auto-partition.time-unit` (e.g. `yyyyMMdd` for DAY). When set, this value overrides the format derived from the time unit, while the partition granularity still follows `table.auto-partition.time-unit`. A custom format must use zero-padded numeric fields covering at least the unit's precision so that partition values sort by time as strings (e.g. `yyyy-MM-dd` for DAY). | |
There was a problem hiding this comment.
The table.auto-partition.time-format description doesn’t call out Fluss partition value character restrictions (only [A-Za-z0-9_-] are allowed). Without this, users may choose formats like yyyy/MM/dd that work as DateTimeFormatter patterns but will produce invalid partition names and fail at runtime. Consider adding a brief note about the allowed output characters / disallowed separators.
| | table.auto-partition.time-format | String | (derived from unit) | The time format used for auto-created partition values. If not set, the format is derived from `table.auto-partition.time-unit` (e.g. `yyyyMMdd` for DAY). When set, this value overrides the format derived from the time unit, while the partition granularity still follows `table.auto-partition.time-unit`. A custom format must use zero-padded numeric fields covering at least the unit's precision so that partition values sort by time as strings (e.g. `yyyy-MM-dd` for DAY). | | |
| | table.auto-partition.time-format | String | (derived from unit) | The time format used for auto-created partition values. If not set, the format is derived from `table.auto-partition.time-unit` (e.g. `yyyyMMdd` for DAY). When set, this value overrides the format derived from the time unit, while the partition granularity still follows `table.auto-partition.time-unit`. A custom format must use zero-padded numeric fields covering at least the unit's precision so that partition values sort by time as strings (e.g. `yyyy-MM-dd` for DAY). The formatted partition value must contain only characters allowed by Fluss partition values: `[A-Za-z0-9_-]`. Do not use separators or other characters outside this set (for example, `yyyy/MM/dd` is invalid because `/` is not allowed). | |
Purpose
Linked issue: close #3191
Brief change log
Tests
API and Format
Documentation