- Newest
- Most votes
- Most comments
I found a solution to this (at least in my case).
I had an index referencing the partitions (which I did not set up). I deleted the index and then was able to rename the partitions
Partition columns is something very sensitive that should not be changed since normally you'll break the table. If you can it's always better to follow the conventions, creating tables where the directory structure doesn't match the table, is often problems down the line.
If your files follow the convention, you can recreate the table and run REPAIR (see https://docs.aws.amazon.com/athena/latest/ug/msck-repair-table.html), but if the crawler didn't do that, probably means your directories don't have the partition names.
The documentation needs to be updated to say that if there is an index it need to be dropped to be able to rename
Thanks, but I guess I didn't explain the issue clearly. I'm not looking to repartition the data, I'm just looking to assign meaningful names to the partitions discovered by the Glue crawler. The documentation I'm referring to linked above is:
"To change the default names on the console, navigate to the table, choose Edit Schema, and modify the names of the partition columns there."
That doesn't work anymore.
I see your point about the documentation, thanks for raising that. Maybe you could try to rename using ALTER TABLE https://docs.aws.amazon.com/athena/latest/ug/alter-table-replace-columns.html Anyway, it's not ideal when the column names don't match the directories, if you don't follow the standards you will find some limitations and will have to add partitions yourself in the future
I don't really understand how the column name can match the directories in the case of a time-based partition. What's happening is Kinesis Firehose is streaming data into S3 into a directory structure of the year / month / date / hour. The directory names are just the numbers associated with these (i.e., 2023/05/10/12)- all Glue gives us upon crawling that structure is default partition names of partition_0 for the year level, partition_1 for month, etc.
I read that Glue crawlers can interpret directory names in the style of "year=2023" for example in order to attach the desired column names for the partition that way, but if there's a way to configure Firehose to do that then I'm missing it.
FYI I'm told that ALTER TABLE results in an error in this case.
When the naming conventions are not followed, everything is harder. You could live with the generic column names, or define the partitions with the meaningful names and then load the partitions using ALTER TABLE ADD PARTITION commands using the meaningful names to their corresponding paths. It might be possible to update an existing table modifying the metadata directly using the command line API but there is a good change the table gets broken and unusable
Sorry if I'm being clueless here. How can I configure Firehose to "follow the naming conventions" when writing data into S3?
Relevant content
- asked 7 months ago
- Accepted Answerasked 6 months ago
- asked 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated a year ago
That's a great catch, now the partitioned tables created by the crawler and indexed automatically and that what prevents the update. Thus to update the column names would need to delete the index (and created if desired)