- Newest
- Most votes
- Most comments
I had a similar requirement. Below is how I managed to make it work:
You will have to make some changes in your code and only use AWS CLI to reset the bookmarks. When you call job.init() instead of passing the default JOB_NAME arg, pass a unique name for each table you are processing, for example whatever you set as the transformation_ctx since it will be unique. With this, Glue will create a bookmark for each table based on the name you set.
As you would guess, resetting the bookmark in glue UI will fail since glue will assume the bookmark jobname is the same as the original job name you see in the Glue UI. So to reset the bookmark to a job run, you will need to use the CLI and replace the --Jobname parameter with the name set directly in the script.
Example: aws glue reset-job-bookmark --job-name <job_name_in_script> --run-id jr_xxxxxxxxxxx
You can only rewind job bookmarks to any previous job run - https://docs.aws.amazon.com/cli/latest/reference/glue/reset-job-bookmark.html Since there are multiple tables being processed in a single job, this would mean reprocessing data for all of the tables - even for those tables where this issue didn't happen. It seems like the first option would be better - to make the entire job fail even if one table load is failing.
Relevant content
- asked a year ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 8 months ago
