AWS Glue partitions for data catalog table

1

I am trying to create a glue ETL of a very large legacy table that has been added by a crawler to the Glue Data Catalog. The Glue documentation states I should be able to see/edit "partitions" for the table in the catalog. I see no options anywhere in the DC to edit/create partitions or specify a column or expression to use for partitioning. Is this functionality available?

https://docs.aws.amazon.com/glue/latest/dg/partition-indexes.html

asked 2 years ago6436 views
1 Answer
1

Hello John,

Unfortunately, it is currently not possible to add/create partitions to Glue table via the Glue console. However, it is possible to edit table schema which means you can add or remove partition columns in the console.

If you want to edit the partition columns, you may follow the below steps, however, partitions would still need to be added to the table:

  1. Under Data catalog, select Tables, then select the table in which you want to create a partition.
  2. Select Edit schema option.
  3. Click on Add column and enter the preferred name, select the partition key checkbox, then Click Add.
  4. Click Save on the top right corner. 5 Go back to the table details and you can click on View partitions option to check the partitions.

I see that your Glue table had been created via a crawler. Ideally, a Glue crawler also adds partitions to the table if the data in S3 is stored in a partitioned format. I suspect that the data in S3 might not be in a partitioned format which is why crawler might not have created the partitions.

You can manually add partitions to your table using the following approaches:

  • Add Glue Table Partition using Boto 3 SDK You can use AWS Boto 3 SDK to create glue partitions using the batch_create_partition() or create_partition() APIs.

  • Using Alter Table Add Partition command You can run the Alter Table Add Partition SQL command via Athena to add the partitions manually into the table.

https://docs.aws.amazon.com/athena/latest/ug/alter-table-add-partition.html

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_partition

AWS
SUPPORT ENGINEER
answered 2 years ago
  • Thanks for your comments. Sadly it isn't germane to the issue we are facing as we are using a MySQL database, NOT Athena or any cloud DB. I was able from your tip to delete the datetime column and re-add it to the DC Table schema whilst checking the "partition" checkbox. This caused the "Partitions and Indexes" option to show up under the table but sadly it says "Unsupported partition keys" for the column I specified as a partition column :(

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions