How to pass parameters from an event rule through a glue workflow trigger to a job

1

I have an event rule rule that triggers a glue job. I would like to pass information from the event details as a parameter to the Glue job. For example, say my event contains: {"details": {"database_name": "my_database"}} and and my job has a parameter --DATABASE_NAME. Is it possible to pass database_name from the event as the --DATABASE_NAME parameter in the job? I haven't been able to find anything in the documentation or online on how to do it.

Thanks!

asked 2 years ago4812 views
2 Answers
1

There is no native option to pass EventBridge event details to Glue job. But I used the following workaround to do this:

  1. You need to get an event ID from Glue workflow properties
event_id = glue_client.get_workflow_run_properties(Name=self.args['WORKFLOW_NAME'],
                       RunId=self.args['WORKFLOW_RUN_ID'])['RunProperties']['aws:eventIds'][1:-1]
  1. Get all NotifyEvent events for the last several minutes. It's up to you to decide how much time can pass between the workflow start and your job start.
response = event_client.lookup_events(LookupAttributes=[{'AttributeKey': 'EventName',
                                                         'AttributeValue': 'NotifyEvent'}],
                                                         StartTime=(datetime.datetime.now() - datetime.timedelta(minutes=5)),
                                                         EndTime=datetime.datetime.now())['Events']
  1. Check which event has an enclosed event with the event ID we get from Glue workflow.
for i in range(len(response)):
      event_payload = json.loads(response[i]['CloudTrailEvent'])['requestParameters']['eventPayload']
      if event_payload['eventId'] == event_id:
                event = json.loads(event_payload['eventBody'])

In the event variable you get the full content of the event that triggered Glue workflow.

profile picture
answered a year ago
0

To pass a parameter to a workflow, use StartWorkflowRun[1] API with RunProperties which takes key-value pairs and it can be accessed in the any job in the workflow using GetWorkflowRunProperties[2]. Further, if required, it can be modified in any job in the workflow using PutWorkflowRunProperties[3].

If triggering an ETL job instead of a workflow, use StartJobRun[4] and set the job arguments. To access these job arguments in the script, use getResolvedOptions[5]

[1] https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-workflow.html#aws-glue-api-workflow-StartWorkflowRun

[2] https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-workflow.html#aws-glue-api-workflow-GetWorkflowRunProperties

[3] https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-workflow.html#aws-glue-api-workflow-PutWorkflowRunProperties

[4] https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-jobs-runs.html#aws-glue-api-jobs-runs-StartJobRun

[5] https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html

AWS
answered 2 years ago
  • Hi Sachin,

    The one part that isn't clear to me from your answer is if I have a trigger configured in glue based on an EventBridge event, how would you map the event contents to the job as params? The example in the question is the field database_name. If that value can be different for each event how do you configure the trigger to take that field and pass it in as a key \ value? If you can't configure it in the trigger is there a way once the job starts to get the event body in the job to extract the info?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions