AWS Glue visual studio S3 target not updating schema

0

Hi,

I am using Glue studio editor to write some ETL with target S3. In the configuration I checked the flag Create a table in the Data Catalog and on subsequent runs, update the schema and add new partitions to automatically update the schema but it doesn't work and looking at the logs there are no error. The first time I run the job the table is created correctly but for example if I change output format from parquet to json the glue table is not updated. Any idea on why?

Thanks

Paolo
asked a year ago556 views
1 Answer
2

Please check the "Job Bookmark" option in Job details, if the Job bookmark is enabled then AWS Glue tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data. So, I think in your case the bookmark is enabled thus when you re-run the job then it will skip the processing of the data as it has not changed after the first run. You can re-run the job either by disabling the job bookmark or by making some changes to the source data.

profile pictureAWS
EXPERT
answered a year ago
  • I checked and the job bookmark is set to disabled

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions