Glue Workflow: Add same glue job multiple times, each with different parameters

1

I have created a templated job (with parameters) to ingest data from different tables (passing the database and table as parameter) and write the data to S3 (passing the destination bucket as parameter as well). Now I would like to create a workflow where I can add the job multiple times, once for each table I need to ingest from. But in the Workflow UI, when I try to add the same job again, it doesn't add a new block for the job. How should I approach to solve this?

已提問 2 年前檢視次數 351 次
1 個回答
0

Hello,

To achieve your desired workflow of running the same Glue job multiple times with different parameters for each execution, you can follow these steps:

Create a Parameterised Glue Job[1]: First, make sure your Glue job is parameterised properly so that you can pass different values for database, table, and destination bucket each time it runs. You can use Glue's Job Parameters feature to achieve this. When defining your Glue job, declare the parameters you want to be able to pass dynamically, like database, table, and destination_bucket. Please make sure that the role making the call to S3 has the required S3 permissions (assuming this operation is on the same account).

Use AWS Step Functions: AWS Step Functions allow you to create a workflow that coordinates multiple AWS services, including Glue jobs. You can use Step Functions to loop through a list of tables and invoke the same Glue job multiple times, each time with different parameters. Here is a link on how you can create a step function workflow. Link URL: https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-workflow-studio-using.html

References: [1]AWS Glue job parameters https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

[2]What is AWS Step Functions? https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html

AWS
已回答 8 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南