How to deal with zombie Airflow BackfillJobs

0

For our MWAA v2.2.2 environment, we are seeing several BackfillJobs with live heartbeats but no corresponding dagruns in a running state. Going to the Browse > Jobs page, we can see these jobs as still running. These backfill jobs in particular may have been caused by running backfill for the same dag, multiple times -- for some reason, they never failed or completed, resulting in a zombie state.

Reading the Airflow docs, it looks like we'd need access to the underlying machines to terminate the processes for these backfill jobs. Given that MWAA is a managed service running on fargate, how would we go about terminating these jobs? Is there a way to forcibly cycle all the containers?

ukayani
posta un anno fa617 visualizzazioni
1 Risposta
0

Reading the Airflow docs, it looks like we'd need access to the underlying machines to terminate the processes for these backfill jobs. Given that MWAA is a managed service running on fargate, how would we go about terminating these jobs? Is there a way to forcibly cycle all the containers?

You can programmatically force a hard reset of the environment following the recommendations provided here (in the top answer). The items listed in the answer will restart MWAA and its containers.

AWS
Andrew
con risposta un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande