How to rewind Job bookmark programatically

1

I am using Glue service to read the files and migrating to database. The same script is run for 30-40 tables. The S3 path and table name are changing dynamically through a csv file I am passing. Doing this to avoid creating that many jobs. Each datasource being read includes their own dedicated transformation_ctx property. Next time when the job runs again it picks where the tables where last read.The problem I am facing is when any of the table load fails. For those too the file was read already but write did not happen, due to which I would lose the data which was read but not written for that specific table in the next run. Below are the possibilities I have come up with: 1. Make the entire job fail if any of the table load is failing 2. Add notification for failed table sent over email (so that I could troubleshoot) and rewind bookmark for the failed table and process next tables.

I am unable to achieve the second option, as I don't want to stop other tables from being written. I would like to rewind the bookmark by code or reprocess files for that table only, not all tables.

Can I achieve this with any other way?

asked 2 years ago1269 views
1 Answer
-1

You can only rewind job bookmarks to any previous job run - https://docs.aws.amazon.com/cli/latest/reference/glue/reset-job-bookmark.html Since there are multiple tables being processed in a single job, this would mean reprocessing data for all of the tables - even for those tables where this issue didn't happen. It seems like the first option would be better - to make the entire job fail even if one table load is failing.

AWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions