Looking for different ways to Orchestrate Glue ETL jobs


I have a scenario where I have multiple glue ETL jobs which are interdependent, and they have some logical order to follow. I am looking for the best possible approaches within aws solutions to trigger a group of Glue jobs, based on the success/failure state of a different group of Glue jobs, i.e., setting up a combination of series and parallel execution of jobs under one entity which can then be reused in another such entity to avoid building the whole flow again (like we use a shell script to group and conditionally orchestrate python scripts). A GUI visual to represent the dataflows will be an added advantage.

I have tried and tested the Workflow feature within Glue to simulate this requirement, I was able to create the grouping of jobs based on triggers, but the major drawback was that I could not call/invoke existing workflows into a bigger workflow (like a parent WF which can fire the end-to-end ETL), thereby requiring me to build the whole flow again each and every time.

I have knowledge on SAP BODS ETL (if we need to draw comparisons), requesting experts' views to address this requirement. Thanks in advance!

1 Answer

I would recommend looking into Step Functions, a serverless orchestration service, which lets you orchestrate over 200 AWS services, including Glue jobs.

profile picture
answered 9 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions