Skip to content
Table Format Maintenance & Operations Last updated: May 29, 2026

Iceberg Spark Procedure rewrite_data_files

A Spark SQL metadata procedure in Apache Iceberg used to consolidate small files and apply sorting or clustering strategies to optimize table layouts.

rewrite_data_files sparkiceberg compaction sparkspark sql call rewrite_data_files

Iceberg Spark Procedure rewrite_data_files

The Iceberg Spark Procedure rewrite_data_files is a maintenance function executed via Spark SQL to perform file compaction. Over time, streaming ingest or frequent updates can write many small files, which degrades query planning and read performance (known as the small file problem). This procedure consolidates small files into optimal target sizes (commonly 512 MB or 1 GB) and optionally sorts or clusters data.

Syntax and Strategies

The procedure is executed using the Spark SQL CALL syntax. It supports several parameters and optimization strategies:

/* Execute compaction on a table using the binpack strategy */
CALL prod.system.rewrite_data_files(
    table => 'db.web_logs',
    strategy => 'binpack',
    options => map('target-file-size-bytes', '536870912')
);

Supported Strategies:

Performance Tuning

Writers can limit the compaction scope by passing a filter map. This restricts the compaction to specific partitions (e.g. older historical data), leaving active write partitions untouched to minimize write conflicts.

๐Ÿ“š Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

โ† Back to Iceberg Knowledge Base