How do I use AWS Glue workflows to automatically start a job when a crawler run completes?

3 minute read
0

I want to use AWS Glue workflows to automatically start a job when a crawler run completes.

Short description

To start a job when a crawler run completes, create an AWS Glue workflow and two triggers. One trigger is for the crawler and the other trigger is for the job. This method requires you to start the crawler from the Workflows page on the AWS Glue console.

Note: You can also use an AWS Lambda function and an Amazon EventBridge rule to automate job runs. If you choose this option, then the Lambda function is always on. The function monitors the crawler regardless of where or when you start the function. For more information, see How can I use a Lambda function to automatically start an AWS Glue job when a crawler run completes?

Resolution

Prerequisites: To complete the resolution steps, you must have an AWS Glue extract, transform, and load (ETL) job and an AWS Glue crawler. You must also have an AWS Identity and Access Management (IAM) role for AWS Glue that has the AWSGlueServiceRole policy attached.

Create the workflow

Complete the following steps:

  1. Open the AWS Glue console.
  2. In the navigation pane, choose Workflows, and then choose Add workflow.
  3. Enter a name for the workflow, and then choose Add workflow. The new workflow appears in the list on the Workflows page.

Create the trigger for the crawler

Complete the following steps:

  1. On the Workflows page, select your new workflow, and then choose the Graph tab.
  2. Choose Add trigger, and then choose the Add new tab. For Trigger type, choose On demand.
  3. Choose Add. The trigger appears on the graph.
  4. On the graph, choose Add node.
  5. On the Crawlers tab, select your crawler, and then choose Add.

Create the trigger for the AWS Glue job

Complete the following steps:

  1. In the Action menu above the graph, choose Add trigger.
  2. Choose the Add new tab, and then select the following options:
    For Trigger type, choose Event.
    For Trigger logic, choose Start after ALL watched event.
  3. Choose Add. The trigger appears on the graph.
  4. On the graph, to the left of to the job trigger that you just created, choose Add node.
  5. On the Crawlers tab, select your crawler, and then choose Add. The trigger appears on the graph.
  6. On the graph, to the right of the job trigger that you just created, choose Add node.
  7. On the Jobs tab, select the job that you want to start when the crawler run completes, and then choose Add.

Test the workflow

Complete the following steps:

  1. On the Actions menu, next to the Add workflow button, choose Run. The Last run status column changes to Running.
  2. Check the Graph tab to see the status of the workflow. Or, open your corresponding crawler or job to confirm that it's running.

Related information

Creating and building out a workflow manually in AWS Glue

AWS OFFICIAL
AWS OFFICIALUpdated 6 months ago