I want to configure AWS Glue to automatically start a job when a crawler run completes.
Resolution
You can use AWS Glue triggers to start a job when a crawler run completes. However, the AWS Glue console supports only jobs and doesn't support crawlers when working with triggers. You can use the AWS Command Line Interface (AWS CLI) or AWS Glue API to configure triggers for both jobs and crawlers.
Run the following AWS CLI command to create a trigger that can start a job when a crawler run completes:
$ aws glue create-trigger --name testTrigger --type CONDITIONAL --predicate 'Logical=AND,Conditions=[{LogicalOperator=EQUALS,CrawlerName=testCrawler,CrawlState=SUCCEEDED}]' --actions JobName=testJob --start-on-creation
Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent version of the AWS CLI.
You can also create a trigger using Python boto3 SDK:
import boto3
client = boto3.client("glue")
response = client.create_trigger(
Name="testTrigger",
Type="CONDITIONAL",
Predicate={
"Logical": "AND",
"Conditions": [
{
"LogicalOperator": "EQUALS",
"CrawlerName": "testCrawler",
"CrawlState": "SUCCEEDED",
},
],
},
Actions=[
{"JobName": "testJob"},
],
StartOnCreation=True,
)
With either of the preceding approaches, you can create the trigger testTrigger that can start the job testJob after the crawler testCrawler runs successfully.
Note: The crawler testCrawler must be started only using a trigger. If you start the crawler manually, then the job doesn't get fired by the trigger. In AWS Glue, all jobs or crawlers are started only if they are started by a trigger. Be sure that all jobs or crawlers in a dependency chain are descendants of the scheduled or on-demand triggers.
Additionally, you can use one of the following methods: