AWS Glue visual studio S3 target not updating schema

0

Hi,

I am using Glue studio editor to write some ETL with target S3. In the configuration I checked the flag Create a table in the Data Catalog and on subsequent runs, update the schema and add new partitions to automatically update the schema but it doesn't work and looking at the logs there are no error. The first time I run the job the table is created correctly but for example if I change output format from parquet to json the glue table is not updated. Any idea on why?

Thanks

Paolo
posta un anno fa566 visualizzazioni
1 Risposta
2

Please check the "Job Bookmark" option in Job details, if the Job bookmark is enabled then AWS Glue tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data. So, I think in your case the bookmark is enabled thus when you re-run the job then it will skip the processing of the data as it has not changed after the first run. You can re-run the job either by disabling the job bookmark or by making some changes to the source data.

profile pictureAWS
ESPERTO
con risposta un anno fa
  • I checked and the job bookmark is set to disabled

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande