How to rewind Job bookmark programatically

1

I am using Glue service to read the files and migrating to database. The same script is run for 30-40 tables. The S3 path and table name are changing dynamically through a csv file I am passing. Doing this to avoid creating that many jobs. Each datasource being read includes their own dedicated transformation_ctx property. Next time when the job runs again it picks where the tables where last read.The problem I am facing is when any of the table load fails. For those too the file was read already but write did not happen, due to which I would lose the data which was read but not written for that specific table in the next run. Below are the possibilities I have come up with: 1. Make the entire job fail if any of the table load is failing 2. Add notification for failed table sent over email (so that I could troubleshoot) and rewind bookmark for the failed table and process next tables.

I am unable to achieve the second option, as I don't want to stop other tables from being written. I would like to rewind the bookmark by code or reprocess files for that table only, not all tables.

Can I achieve this with any other way?

質問済み 2年前1299ビュー
1回答
-1

You can only rewind job bookmarks to any previous job run - https://docs.aws.amazon.com/cli/latest/reference/glue/reset-job-bookmark.html Since there are multiple tables being processed in a single job, this would mean reprocessing data for all of the tables - even for those tables where this issue didn't happen. It seems like the first option would be better - to make the entire job fail even if one table load is failing.

AWS
回答済み 2年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ