Skip to content

Metadata file for iceberg table doesn't update when modify schema using Glue Data catalog UI

0

Here is my table schema Enter image description here

When i add new column (**cole ** for example), i query the table using Athena and the schema on the left side said it was updated (cole was included) but the actual data return doesn't have cole in column list

Enter image description here

If i change the metadata file in S3, the return data will include the new column. So my question is, why are the schema and the metadata file is not synced? Is there another way for me to solve this without changing the metadata file

  • Did you find a way to push down from Glue to Iceberg metadata? I'm experiencing the same problem.

asked 10 months ago1.8K views
1 Answer
-1

The issue you're experiencing is likely due to the way Iceberg tables handle schema evolution and metadata updates. When you modify the schema using the AWS Glue Data Catalog UI, it updates the schema information in the Glue catalog, but this change doesn't automatically propagate to the Iceberg table's metadata files stored in S3.

Iceberg schema updates are designed to be metadata-only changes, meaning no data files are modified when you perform a schema update. However, for these changes to be reflected in queries, the Iceberg table metadata needs to be updated as well.

To resolve this issue without manually changing the metadata file, you have a few options:

  1. Use Athena DDL statements: Instead of modifying the schema through the Glue Data Catalog UI, use Athena to alter the table schema. Athena is aware of the Iceberg format and will update both the Glue catalog and the Iceberg metadata appropriately. You can use statements like ALTER TABLE ADD COLUMNS to add new columns.

  2. Trigger a metadata refresh: After making changes in the Glue Data Catalog, you may need to trigger a metadata refresh for the Iceberg table. This can often be done by running a simple query on the table in Athena, which will cause it to read the latest metadata.

  3. Use AWS Glue crawlers: Configure and run an AWS Glue crawler on your Iceberg table. Glue crawlers support Iceberg tables and can update the metadata, ensuring that the schema changes are properly reflected.

  4. Verify Glue version: Ensure you're using a recent version of AWS Glue that fully supports Iceberg table operations. Older versions may not handle Iceberg metadata updates correctly.

  5. Check table properties: Verify that the table properties in Glue correctly identify the table as an Iceberg table. The table_type property should be set to ICEBERG.

By using these methods, particularly using Athena DDL statements for schema changes, you should be able to keep the schema and metadata file in sync without manually modifying the metadata files in S3.
Sources
Evolve Iceberg table schema - Amazon Athena
Create Iceberg tables - Amazon Athena

answered 10 months ago
  • Triggering metadata refresh, as suggested, did not work. The metadata.json file is still the same

  • I agree with BRUNO KIM refresh doesn't work

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.