Skip to content
Modern Lakehouse Concepts & Interoperability Last updated: May 29, 2026

Write Amplification

A performance metric measuring the ratio of physical data bytes written to storage compared to the logical data bytes updated by a write transaction.

write amplificationcopy on write amplificationwrite performancedata compaction write

Write Amplification

Write Amplification is a metric that measures the amount of physical data written to storage relative to the size of the logical changes committed by a user or ETL job. In data lakehouse architectures, write amplification affects write throughput, ingestion latencies, and object storage costs.

Ingestion Strategies and Amplification

In Apache Iceberg, write amplification is heavily influenced by the chosen write strategy:

1. Copy-on-Write (CoW)

Under CoW, updating a single row requires rewriting the entire data file containing that row:

  Update 1 Row ──> [Rewrite 128 MB Data File] (High Write Amplification)

If a table has a target file size of 128 MB, updating a single record yields a write amplification factor of 128,000,000.

2. Merge-on-Read (MoR)

MoR minimizes write amplification by writing updates or deletes to separate, small delete files:

  Update 1 Row ──> [Write 1 KB Position Delete File] (Low Write Amplification)

This keeps ingestion fast but shifts the read overhead to query time.

Other Sources of Write Amplification

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base