Skip to content
Dremio-Specific Engine & Optimizations Last updated: May 29, 2026

Dremio Metadata Caching

Dremio Metadata Caching is the process of storing table metadata (such as schemas, partition statistics, and file lists) locally on coordinator nodes to accelerate query planning and bypass object storage request latency.

dremio metadata cachingquery planning accelerationobject storage latency bypassmetadata cache refreshcoordinator caching

Dremio Metadata Caching

Dremio Metadata Caching is an optimization system that stores table structures, file locations, schemas, and partition statistics locally on Dremio coordinator nodes. During the query compilation phase, the coordinator node must evaluate the composition of a table (such as file lists, column boundaries, and data sizes) to construct an optimal query plan.

Retrieving this metadata directly from cloud object storage (such as Amazon S3, Azure Data Lake Storage, or Google Cloud Storage) or external database catalogs on every query introduces high latency. Dremio Metadata Caching keeps this information in local coordinator storage, enabling rapid query planning and sub-second execution starts.

Why Metadata Caching is Critical

Querying data lakes without metadata caching suffers from significant performance bottlenecks:

How Dremio Caches Table Metadata

Dremio’s coordinator handles metadata tracking based on the data source type:

For Raw File Directories

When querying raw files (such as directories containing Parquet or CSV logs), Dremio automatically scans the directories, learns the schemas, and caches the file listings and block locations.

For Apache Iceberg Tables

Apache Iceberg tables maintain their own metadata structure (manifest lists, manifest files, and .metadata.json files). Dremio’s coordinator reads these Iceberg files and caches their contents locally. During query planning:

  1. Cache Lookup: Dremio checks the local cache for the table’s current snapshot and manifest definitions.
  2. Split Planning: The query planner uses the cached manifest information to partition data and calculate file splits.
  3. Bypassing S3 Reads: The planner avoids contacting S3 for manifest files, completing query compile loops in milliseconds.

Cache Refresh Configurations

To ensure data accuracy, Dremio provides configurable policies to manage cache synchronization:

ALTER PDS analytics.orders REFRESH METADATA;

For Iceberg tables managed through REST Catalogs (like Apache Polaris or other managed catalogs), metadata changes are communicated instantly via catalog pointer updates, ensuring Dremio queries read fresh data without metadata lag.

📚 Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.

← Back to Iceberg Knowledge Base