Skip to content

How to add partition fields to Iceberg table

0

I have a created an Iceberg in the Glue catalog. No partition spec was defined at the time of creation but as we are ingesting more data, a few partition fields must be added. It seems like Athena powered EMR studio query editor does not support ALTER .. ADD PARTITION FIELD command. How can I add partition fields to an existing Iceberg table?

asked 9 months ago1.6K views
1 Answer
1

To add partition fields to an existing Iceberg table in AWS Glue catalog, you have a few options:

  1. Using Apache Spark: If you have access to Apache Spark (e.g., through Amazon EMR), you can use Spark SQL to alter the table and add partition fields. The syntax would be similar to:

    ALTER TABLE catalog_name.database_name.table_name
    ADD PARTITION FIELD field_name
    

    You would need to run this Spark SQL command for each partition field you want to add.

  2. Using AWS Glue ETL jobs: You can create an AWS Glue ETL job to update the table schema and add partition fields. In your Glue ETL script, you would:

    a. Read the existing table data b. Define the new schema with partition fields c. Write the data back to a new or the same table with the updated schema

    When writing the data, make sure to set enableUpdateCatalog to true and updateBehavior to UPDATE_IN_DATABASE in your job options. This will update the schema in the Glue Data Catalog.

  3. Using AWS Glue Crawler: After adding the partition data to your S3 location, you can run an AWS Glue Crawler configured to update the table definition. The crawler will detect the new partition structure and update the table metadata accordingly.

  4. Manual update through AWS Glue Console: As a last resort, you can manually edit the table schema through the AWS Glue Console. Navigate to the table in the Glue Data Catalog, edit the schema, and add the new partition fields.

Remember that adding partition fields to an existing Iceberg table may require rewriting the data to organize it according to the new partition structure. This operation can be resource-intensive for large datasets.

If Athena powered EMR studio query editor doesn't support the ALTER TABLE ADD PARTITION FIELD command for Iceberg tables, consider using one of the above methods, particularly Spark SQL through EMR or a Glue ETL job, as these are more aligned with Iceberg's native operations.
Sources
Working with Apache Iceberg tables by using Apache Spark - AWS Prescriptive Guidance
Updating the schema, and adding new partitions in the Data Catalog using AWS Glue ETL jobs - AWS Glue

answered 9 months ago
EXPERT
reviewed 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.