Logs for MWAA task restarts

0

Hi team, We are using MWAA and glue jobs and in some cases our MWAA tasks are failing but the glue jobs are succeeding. In this case we need to restart the job and/or clear the status in MWAA.

Is there a way to track these restarts in cloudtrail or log them out somewhere so we can keep track of which user is initiating it and get details from the error log?

asked 10 months ago665 views
2 Answers
0

Hello,

I understand that in some cases, your MWAA tasks are failing but glue jobs are succeeding. In this case you have to restart the jobs and want to keep a track of which user initiated the job. I will be answering your question in two parts as below.

Issue 1:

MWAA task is failing but the glue job succeedes.

The task runner of a worker periodically pushes heartbeat to the metadata db. If the scheduler does not find a heartbeat for a particular amount of time, the tasks are marked as failed. Essentially, MWAA only triggers your Glue job to start and waits until it finishes and sends back the success status for the task to be marked successful.

By default, this time limit is set to 300 seconds as per the below documents: [+] https://airflow.apache.org/docs/apache-airflow/2.2.2/configurations-ref.html#scheduler-zombie-task-threshold [+] https://docs.aws.amazon.com/mwaa/latest/userguide/configuring-env-variables.html#configuring-env-variables-scheduler

Therefore, in order to mitigate the issue, you can increase the time in seconds to a greater value for the task to be successful. In order to set this configuration in MWAA, you can edit the environments 'Airflow configuration options' and set the key as 'scheduler.scheduler_zombie_task_threshold' and value as time in seconds.

Issue 2:

Keeping a track of which user initiated the restart of the job

Whenever a job is started in Glue, StartJobRun API is called and logged to the cloudtrail events. You can try searching for this API in your Cloudtrail logs and view the user which had initiated the job run.

Additionally, you can setup Amazon SNS notifications for your jobs to be notified whenever there is a change in your job status (for instance whenever a job is started or failed). Please refer to the below documentation for the same. [+] https://repost.aws/knowledge-center/glue-sns-notification-state

AWS
Ankur_J
answered 10 months ago
0

I tried looking in cloudtrail but there wasn't any indicator of when jobs were restarted and by whom.

We thought about creating an interface to act as a control panel for MWAA and trigger restarts, while logging the person who started it, but then realized that the Airflow API is not accessible. I did read that this is something that is being considered. Any updates?

answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions