How to deal with zombie Airflow BackfillJobs

0

For our MWAA v2.2.2 environment, we are seeing several BackfillJobs with live heartbeats but no corresponding dagruns in a running state. Going to the Browse > Jobs page, we can see these jobs as still running. These backfill jobs in particular may have been caused by running backfill for the same dag, multiple times -- for some reason, they never failed or completed, resulting in a zombie state.

Reading the Airflow docs, it looks like we'd need access to the underlying machines to terminate the processes for these backfill jobs. Given that MWAA is a managed service running on fargate, how would we go about terminating these jobs? Is there a way to forcibly cycle all the containers?

ukayani
feita há um ano619 visualizações
1 Resposta
0

Reading the Airflow docs, it looks like we'd need access to the underlying machines to terminate the processes for these backfill jobs. Given that MWAA is a managed service running on fargate, how would we go about terminating these jobs? Is there a way to forcibly cycle all the containers?

You can programmatically force a hard reset of the environment following the recommendations provided here (in the top answer). The items listed in the answer will restart MWAA and its containers.

AWS
Andrew
respondido há um ano

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas