Can no longer rename partitions in a glue table?

1

Hi, it seems that sometime in the past couple of weeks, the ability to rename a partition column in a Glue table created by a crawler has been removed.

The documentation still claims you can do this, but if I edit a Glue table in the console, change a partition column name (which the UI allows,) and then save the new table version, I'm met with the error:

InvalidInputException (status: 400): Trying to change partitionColumn name from : partition_0to new partitionColumn name : year. Change of partitionColumn names is not allowed.

In my case, I am streaming in log data in S3 partitioned by hour, which is an important optimization for querying it efficiently. But I'm forced to use these partitions by referring to "year" in my Athena queries as "partiton_0", month as "partiton_1", etc. This seems silly, and I can't seem to find a workaround. I tried setting up an ETL job in Glue to do this in a more modern way, but it doesn't see the partition columns at all.

Is this a bug that will be fixed, is there some new way to transform the partition column names, or am I just out of luck now?

Sundog
asked a year ago1066 views
2 Answers
1

I found a solution to this (at least in my case).

I had an index referencing the partitions (which I did not set up). I deleted the index and then was able to rename the partitions

answered a year ago
  • That's a great catch, now the partitioned tables created by the crawler and indexed automatically and that what prevents the update. Thus to update the column names would need to delete the index (and created if desired)

-1

Partition columns is something very sensitive that should not be changed since normally you'll break the table. If you can it's always better to follow the conventions, creating tables where the directory structure doesn't match the table, is often problems down the line.
If your files follow the convention, you can recreate the table and run REPAIR (see https://docs.aws.amazon.com/athena/latest/ug/msck-repair-table.html), but if the crawler didn't do that, probably means your directories don't have the partition names.

The documentation needs to be updated to say that if there is an index it need to be dropped to be able to rename

profile pictureAWS
EXPERT
answered a year ago
  • Thanks, but I guess I didn't explain the issue clearly. I'm not looking to repartition the data, I'm just looking to assign meaningful names to the partitions discovered by the Glue crawler. The documentation I'm referring to linked above is:

    "To change the default names on the console, navigate to the table, choose Edit Schema, and modify the names of the partition columns there."

    That doesn't work anymore.

  • I see your point about the documentation, thanks for raising that. Maybe you could try to rename using ALTER TABLE https://docs.aws.amazon.com/athena/latest/ug/alter-table-replace-columns.html Anyway, it's not ideal when the column names don't match the directories, if you don't follow the standards you will find some limitations and will have to add partitions yourself in the future

  • I don't really understand how the column name can match the directories in the case of a time-based partition. What's happening is Kinesis Firehose is streaming data into S3 into a directory structure of the year / month / date / hour. The directory names are just the numbers associated with these (i.e., 2023/05/10/12)- all Glue gives us upon crawling that structure is default partition names of partition_0 for the year level, partition_1 for month, etc.

    I read that Glue crawlers can interpret directory names in the style of "year=2023" for example in order to attach the desired column names for the partition that way, but if there's a way to configure Firehose to do that then I'm missing it.

    FYI I'm told that ALTER TABLE results in an error in this case.

  • When the naming conventions are not followed, everything is harder. You could live with the generic column names, or define the partitions with the meaningful names and then load the partitions using ALTER TABLE ADD PARTITION commands using the meaningful names to their corresponding paths. It might be possible to update an existing table modifying the metadata directly using the command line API but there is a good change the table gets broken and unusable

  • Sorry if I'm being clueless here. How can I configure Firehose to "follow the naming conventions" when writing data into S3?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions