MWAA Heartbeat failure

0

Our MWAA Environment regularly sees

The scheduler does not appear to be running. Last heartbeat was received x minutes ago. messages and scheduling times are delayed on dags.

I have checked the scheduler logs and there are no errors from installing dependencies or any error messages before or after heartbeats dropping to zero. I also have included

--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.2/constraints-3.7.txt" in the requirements file.

There are 2 schedulers running with

scheduler.parsing_processes=7

but I have experimented with smaller numbers.

In the logs I do see

Creating session with aws_access_key_id=None region_name=us-east-1
 role_arn is None

but we are able to use the KuberneteOperator correctly and our dags are being pulled form s3 so I don't believe this message is helpful.

Aiflow Version 2.2.2

Happy to add any additional information.

asked a year ago368 views
1 Answer
0

MWAA has a maximum of 4 vCPUs (for a large environment). As such, you essentially have 4 parallel processes. Exceeding the Airflow default scheduler.parsing_processes of 2 will leave no resources left to actually schedule any tasks or update the heartbeat, as you're encountering with your error.

Creating session with aws_access_key_id is an INFO level log item in the Airflow Amazon provider package and not specific to MWAA nor attached to any error condition.

Airflow is highly sensative to DAG parsing, as it runs Python import on every Python file. Recommend using .airflowignore to avoid parsing files that do not create top-level DAG objects and remove top-level code wherever possible (i.e. most code should be executed in a task, rather than by the parsing action--see example here)

AWS
John_J
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions