Logs for MWAA task restarts

0

Hi team, We are using MWAA and glue jobs and in some cases our MWAA tasks are failing but the glue jobs are succeeding. In this case we need to restart the job and/or clear the status in MWAA.

Is there a way to track these restarts in cloudtrail or log them out somewhere so we can keep track of which user is initiating it and get details from the error log?

已提問 10 個月前檢視次數 688 次
2 個答案
0

Hello,

I understand that in some cases, your MWAA tasks are failing but glue jobs are succeeding. In this case you have to restart the jobs and want to keep a track of which user initiated the job. I will be answering your question in two parts as below.

Issue 1:

MWAA task is failing but the glue job succeedes.

The task runner of a worker periodically pushes heartbeat to the metadata db. If the scheduler does not find a heartbeat for a particular amount of time, the tasks are marked as failed. Essentially, MWAA only triggers your Glue job to start and waits until it finishes and sends back the success status for the task to be marked successful.

By default, this time limit is set to 300 seconds as per the below documents: [+] https://airflow.apache.org/docs/apache-airflow/2.2.2/configurations-ref.html#scheduler-zombie-task-threshold [+] https://docs.aws.amazon.com/mwaa/latest/userguide/configuring-env-variables.html#configuring-env-variables-scheduler

Therefore, in order to mitigate the issue, you can increase the time in seconds to a greater value for the task to be successful. In order to set this configuration in MWAA, you can edit the environments 'Airflow configuration options' and set the key as 'scheduler.scheduler_zombie_task_threshold' and value as time in seconds.

Issue 2:

Keeping a track of which user initiated the restart of the job

Whenever a job is started in Glue, StartJobRun API is called and logged to the cloudtrail events. You can try searching for this API in your Cloudtrail logs and view the user which had initiated the job run.

Additionally, you can setup Amazon SNS notifications for your jobs to be notified whenever there is a change in your job status (for instance whenever a job is started or failed). Please refer to the below documentation for the same. [+] https://repost.aws/knowledge-center/glue-sns-notification-state

AWS
Ankur_J
已回答 10 個月前
0

I tried looking in cloudtrail but there wasn't any indicator of when jobs were restarted and by whom.

We thought about creating an interface to act as a control panel for MWAA and trigger restarts, while logging the person who started it, but then realized that the Airflow API is not accessible. I did read that this is something that is being considered. Any updates?

已回答 10 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南