MWAA / Dag / Size

0

We are reaching the maximum number of jobs for our MWAA and we are using the Small instance - which is listed on this page (https://aws.amazon.com/managed-workflows-for-apache-airflow/pricing/) as having capacity for up to 50 dags.

When AWS recommends <50 dags what is the avg number of tasks in the Dag that we are basing this suggestion on? Would it be 10 tasks per dag or 100 tasks per dag?

From my understanding, airflow will go thru every job in every dag every few minutes to see what needs to be scheduled. And, we've noticed some queuing has started after we added about 20 new tasks for a particular period in the batch.

Any recommendations here?

1개 답변
1

Hi,

With regards to the environment capacity, the estimate shown on the documentation is more of a guideline on expected capacity considerations, when deploying DAGs.

These estimations are based on lightweight tasks and should be considered as reference points, and not absolute values.

Airflow tasks running on MWAA are executed within containers that run Python code, and the performance of tasks on the environment depend primarily on the computation and memory available to the workers and scheduler.

This information is also outlined in the Airflow Best Practices.

A smaller environment will have workers with less memory and processing power, and as such they will not be able to run as many DAGs (or tasks) as a larger environment.

It is important to consider the guideline as a rule of thumb, as not all tasks in DAGs will require the same amount of memory and processing (some DAGs and by extension tasks, will need more resource usage than others).

Therefore, it's essential to consider the complexity of your particular tasks to determine the expected number of tasks that would be applicable to your environment.

As the number of tasks per DAG would depend on your use case, you would need to do a benchmark test to find out the most accurate number of tasks per DAG that can be run for your particular use case.

AWS
지원 엔지니어
Sid_S
답변함 8달 전
  • Our Airflow dags are used for scheduling purposes only and are mostly glue job triggers (waiting for completion) and file watchers, but we may have only a few running, or up to 20 or more starting at the same time. We're finding some queueing for 10-15 minutes with our current environment size and are exploring the larger size.

    Thanks for your response, I think even though our tasks are low complexity it may be the quantity that is causing the queuing.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠