How to rewind Job bookmark programatically

1

I am using Glue service to read the files and migrating to database. The same script is run for 30-40 tables. The S3 path and table name are changing dynamically through a csv file I am passing. Doing this to avoid creating that many jobs. Each datasource being read includes their own dedicated transformation_ctx property. Next time when the job runs again it picks where the tables where last read.The problem I am facing is when any of the table load fails. For those too the file was read already but write did not happen, due to which I would lose the data which was read but not written for that specific table in the next run. Below are the possibilities I have come up with: 1. Make the entire job fail if any of the table load is failing 2. Add notification for failed table sent over email (so that I could troubleshoot) and rewind bookmark for the failed table and process next tables.

I am unable to achieve the second option, as I don't want to stop other tables from being written. I would like to rewind the bookmark by code or reprocess files for that table only, not all tables.

Can I achieve this with any other way?

demandé il y a 2 ans1299 vues
1 réponse
-1

You can only rewind job bookmarks to any previous job run - https://docs.aws.amazon.com/cli/latest/reference/glue/reset-job-bookmark.html Since there are multiple tables being processed in a single job, this would mean reprocessing data for all of the tables - even for those tables where this issue didn't happen. It seems like the first option would be better - to make the entire job fail even if one table load is failing.

AWS
répondu il y a 2 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions