- Newest
- Most votes
- Most comments
The issue you're experiencing stems from the difference in how Athena and AWS Glue handle partition projection.
When you use partition projection in Athena, it's important to understand that Athena ignores any partition metadata registered in the AWS Glue Data Catalog. Instead, Athena uses the table properties you've defined to determine partition values and locations. This is why you're able to query the data successfully in Athena despite not seeing any partitions in the AWS Glue console.
However, AWS Glue jobs don't use Athena's partition projection. They rely on the actual partition metadata stored in the AWS Glue Data Catalog. Since your table is using partition projection and there are no partitions registered in Glue, your Glue job is unable to resolve the partition columns, leading to the error you're seeing.
To resolve this issue, you have a few options:
-
If you want to continue using partition projection with Athena while also being able to query the data with Glue jobs, you'll need to manually add the partitions to the AWS Glue Data Catalog. This can be done using the AWS Glue API or by running MSCK REPAIR TABLE in Athena after adjusting your S3 path to use the key=value format (as you've already discovered).
-
If you prefer not to use the key=value format in your S3 path, you could create a custom classifier in AWS Glue to recognize your current S3 path structure. Then use a Glue crawler to populate the partition metadata in the Data Catalog.
-
Another option is to modify your Glue job to read directly from S3 instead of using the Glue Data Catalog. This way, you can specify the S3 path and partition structure explicitly in your Glue job code.
It's worth noting that the partition projection feature in Athena is designed to improve query performance by avoiding the need to retrieve partition metadata from the Glue Data Catalog. However, this optimization is specific to Athena and isn't recognized by other services like AWS Glue.
In your specific case, where you want Glue to recognize 2024/08/01 as the calendarday partition column automatically without the key=value format, you might need to implement a custom solution in your Glue job. This could involve parsing the S3 path to extract the date components and constructing the calendarday value programmatically.
Sources
Use partition projection with Amazon Athena - Amazon Athena
Troubleshoot Athena partition projection issues | AWS re:Post
Troubleshoot Athena queries that have new partitions | AWS re:Post
Relevant content
- asked 2 years ago
- asked 3 years ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 3 months ago