- Newest
- Most votes
- Most comments
This issue could be related to how AWS Glue caches schema information and how it interacts with the Data Catalog. Even though you've updated the S3 dataset and re-crawled it, updating the Glue table definition, there might be a delay or caching issue preventing the Glue job from seeing the new column immediately.
Here are a few things you can try to resolve this issue:
-
Ensure that the Glue job is using the most up-to-date version of the table metadata. Sometimes, there can be a delay in propagating changes across AWS services.
-
Try running the Glue job with a new job run. Sometimes, subsequent runs of the same job might pick up the updated schema.
-
Instead of using "refreshSchema": "true", you could try to explicitly specify the schema in your Glue job. This would force the job to use the schema you provide rather than relying on the cached version.
-
Check if there are any partitions in your data. If your data is partitioned, you might need to update partition metadata separately.
-
Verify that the IAM role associated with your Glue job has the necessary permissions to access the updated table metadata in the Glue Data Catalog.
-
As a workaround, you could try reading the data directly from S3 using create_dynamic_frame.from_options() instead of from_catalog(). This bypasses the Glue Data Catalog and reads the schema directly from the data.
-
If none of the above works, you might want to consider dropping and recreating the table in the Glue Data Catalog, then re-running the crawler. This should ensure that the table definition is completely refreshed.
If the problem persists after trying these solutions, it might be worth contacting AWS support for further assistance, as there could be an underlying issue with the Glue service or how it's interacting with your specific data setup.
Sources
Community | Unleashing Data Analytics on S3 Data lake with AWS Glue Crawler and Amazon Athena
Using crawlers to populate the Data Catalog - AWS Glue
Relevant content
- asked 2 years ago
- asked 3 months ago
- asked 2 months ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 4 months ago