Iceberg Truncate Partition Transform
The Iceberg Truncate Partition Transform partitions data by grouping similar values into broad buckets using truncation rules. For string columns, it truncates text to a specified prefix length. For numeric columns (integers, longs, decimals), it divides and rounds values into fixed-size intervals. This transform reduces partition cardinality while ensuring related data remains co-located.
By grouping keys into wider partition buckets, this strategy prevents the “small file problem” associated with high-cardinality partitions while still allowing engines to skip files during query scans.
Syntax and Behavior
The transform is specified using the syntax truncate(width, column):
/* Partition the log table by a 4-character prefix of the message and ranges of 100 on code */
CREATE TABLE systems.events (
event_id bigint,
error_code int,
message string
)
USING iceberg
PARTITIONED BY (truncate(4, message), truncate(100, error_code));
Behavior by Data Type:
- Strings: Truncates the text to the specified number of characters. For example,
truncate(4, message)maps"application_error"to"appl"and"app_crash"to"app_". - Integers/Longs: Grouped into ranges of the specified width. For example,
truncate(10, error_code)maps values0through9to partition0, and values10through19to partition10. - Decimals: Scales values to the specified width. For example,
truncate(10, price)groups prices in intervals of10.00.
Query Optimization
When a user filters on a column that has been partitioned using a truncate transform, the query engine automatically translates the filter predicate to match the partition layout:
/* The planner automatically scans only the 'appl' partition folder */
SELECT * FROM systems.events WHERE message = 'application_error';
Because the transformation is tracked inside the table’s metadata, query engines handle this translation transparently, requiring no manual partition clauses from developers or analytical users.