Skip to content
Table Format Maintenance & Operations Last updated: May 29, 2026

Iceberg Orphan Files Penalty

The performance and cost overhead incurred when unreferenced, abandoned files accumulate in object storage due to failed transactions or aborted compaction jobs.

orphan files penaltyiceberg storage costfailed transaction garbage collection

Iceberg Orphan Files Penalty

The Iceberg Orphan Files Penalty refers to the storage cost and operational overhead that accumulates when unreferenced physical files are left behind in a table’s storage directory. Because Apache Iceberg defines table state strictly through metadata pointers, any file written during a failed transaction, aborted compaction, or crashed stream remains in storage but is not tracked by the catalog.

Without maintenance, these orphan files remain in the storage bucket indefinitely, incurring ongoing costs and degrading operational tooling performance.

Causes of Orphan Files

Orphan files are commonly generated by the following events:

The Penalties Incurred

  1. Storage Costs: Since data files can be large, hundreds of failed commits can write terabytes of untracked data, inflating monthly cloud storage bills.
  2. Tooling Degredation: Backup utilities, replication scripts, or security scanners that traverse the physical directory must process millions of unreferenced files, slowing down their run times.
  3. Audit Discrepancies: Physical directory size scans will show different storage numbers than those reported by the .files metadata table, complicating audit reports.

Data teams avoid this penalty by scheduling the remove_orphan_files procedure weekly to purge unreferenced blocks from storage.

πŸ“š Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base