- Newest
- Most votes
- Most comments
The behavior you're experiencing with AWS Glue crawlers creating new versions of tables in the Glue Data Catalog even when there are no schema changes is a known characteristic of how crawlers operate. By default, when a crawler runs, it will update the "last updated" timestamp of the table metadata, which results in a new version being created in the Data Catalog.
However, there are a few approaches you can consider to minimize unnecessary version creation:
-
Use incremental crawls: Instead of running full crawls weekly, you can configure your crawler to perform incremental crawls. This way, the crawler will only process new or modified data, potentially reducing the likelihood of creating new versions when no changes have occurred.
-
Adjust the crawler's update behavior: In the crawler's configuration, under "Catalog options," you can modify the update behavior. Instead of "Update the table definition in the data catalog," you could choose "Add new columns only." This option will only update the table if new columns are detected, which may reduce unnecessary version creation.
-
Custom ETL job: If you need more fine-grained control, you could create a custom AWS Glue ETL job that checks for schema changes before updating the Data Catalog. This approach would allow you to implement your own logic for when to create new versions.
-
Use the Schema Registry: For certain types of data, you might benefit from using the AWS Glue Schema Registry. This can help manage schema evolution more precisely, though it's primarily designed for streaming data.
-
Regular cleanup: If the accumulation of versions is a concern, you could implement a process to periodically clean up old versions that are no longer needed.
It's important to note that while these approaches can help reduce unnecessary version creation, they may not completely eliminate it. The creation of new versions, even with only timestamp changes, is part of AWS Glue's design to maintain a history of metadata changes. This can be beneficial for auditing and tracking purposes, but it's understandable if it doesn't align with your specific use case.
If this behavior significantly impacts your workflows, you may want to consider reaching out to AWS support for further guidance or to suggest this as a feature request for more granular control over version creation in the Glue Data Catalog.
Sources
AWS Glue Catalog Data Tables | AWS re:Post
CatalogSchemaChangePolicy - AWS Glue
AWS Glue Data Catalog - AWS Prescriptive Guidance
AWS Glue Data Catalog best practices - AWS Glue
Configure a crawler to handle schema changes | AWS re:Post
Relevant content
- asked a month ago
- asked 8 months ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 9 months ago

Incremental crawls will still create new versions right - I am mainly using it for schema tracking.
I cannot use "Add new columns only." since i want to track table deletions as well.
schema registry does not work for the connectors i want to use.
any other options?