How to deal with zombie Airflow BackfillJobs

0

For our MWAA v2.2.2 environment, we are seeing several BackfillJobs with live heartbeats but no corresponding dagruns in a running state. Going to the Browse > Jobs page, we can see these jobs as still running. These backfill jobs in particular may have been caused by running backfill for the same dag, multiple times -- for some reason, they never failed or completed, resulting in a zombie state.

Reading the Airflow docs, it looks like we'd need access to the underlying machines to terminate the processes for these backfill jobs. Given that MWAA is a managed service running on fargate, how would we go about terminating these jobs? Is there a way to forcibly cycle all the containers?

ukayani
asked a year ago574 views
1 Answer
0

Reading the Airflow docs, it looks like we'd need access to the underlying machines to terminate the processes for these backfill jobs. Given that MWAA is a managed service running on fargate, how would we go about terminating these jobs? Is there a way to forcibly cycle all the containers?

You can programmatically force a hard reset of the environment following the recommendations provided here (in the top answer). The items listed in the answer will restart MWAA and its containers.

AWS
Andrew
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions