By using AWS re:Post, you agree to the Terms of Use
/Can we have one job for loading multiple tables with different bookmarks storing the last execution file/

Can we have one job for loading multiple tables with different bookmarks storing the last execution file

0

Need to write multiple tables from S3 to RDS database. Can I create just one job and send the table names as parameters.
As I have number of tables to be loaded it will become hectic to create one job for each table.

For eg there are 2 s3 paths: s3://my_bucket/table_A , s3://my_bucket/table_B having parquet files generated every hour. Need to store the data from the S3 in table_A and table_B tables respectively and also save the last run file for both. I know that it is possible but will the job bookmark save the last executed file for both?

Any other way to achieve this?

2 Answers
0

Hello,

In order to achieve this use case, here is one option:

It will involve leveraging Job Parameters + Job Bookmarks + a little bit of coding:

Leverage Job Parameters to programmatically pass different arguments to the job (in this case, the S3 paths or prefixes that points to each dataset to be read). The arguments would be then retrieved in the script via getResolvedOptions(https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html).

In terms of Job Bookmarks, this shouldn't be a problem as long as each datasources being read (i.e., S3 paths containing each dataset in question) includes their own dedicated transformation_ctx property, respectively. That is, the value of transformation_ctx for each datasource read has to be unique. This can be achieved with the same approach as in the above point: Use Job Parameters and retrieve them as arguments to be later used as the value of transformation_ctx (or re-use the same arguments that you'll pass for the S3 paths per se). The most important things are: (1) Each datasource has to have a unique transformation_ctx; (2) Subsequent JobRuns should use the same transformation_ctx values for each datasource, respectively. That way, the Bookmarks will be able to keep track of the already-processed data from each datasource, from each S3 path.

See Tracking Processed Data Using Job Bookmarks(https://t.corp.amazon.com/V595024722/communication#:~:text=Tracking%20Processed%20Data%20Using%20Job%20Bookmarks) for more details about transformation_ctx and Job Bookmarks in general. Make also sure the script includes the lines highlighted in bold as seen in the sample script in https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html#monitor-continuations-script

SUPPORT ENGINEER
answered a month ago
  • Yes, I am able to achieve this, but this will lead to serial run of table loads in target. If any of the previous table load fails the entire load will stop and not run for next tables. Any other way using which I can have one script only for all of the tables and bookmark state saved too.

  • This worked, but what if any of the table load failed, and I would have to rewind the bookmark. There is no way to rewind bookmark programmatically. Is there any other way of rewinding bookmark for specific tables so that it can be read again in next run?

0

Hello,

Please provide the following details:

1:Method that you are using to migrate from S3 to RDS 2: The RDS engine that you are migrating to.

Looking forward for your response.

SUPPORT ENGINEER
answered a month ago
  • Using Glue service I am reading the files and migrating to Postgres. But the same script needs to be run for 100s of tables, the S3 path and table name needs to be dynamic that's it. Otherwise I will have to create that many jobs. I was thinking of creating multiple workflows and passing run parameters, but the job bookmarks will not be stored for all tables at once. Is there any other way to do this

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions