MWAA Scheduler does not respect file_min_process_interval

0

We have a 2.0.2 MWAA cluster running in us-east-1. We recently tried to implement dynamic DAG generation instead of Jinja templating our DAG files. In order to do we are reading from os.environ and use the requests library to make a GET request to an internal API service that returns back configurations for the DAGs we need to generate.

Upon copying this file to s3, the DAG processor starts processing this new DAG file every second and ignores changes to the min_file_process_interval Airflow configuration. We've tried changing the value from 30 to 300 to slow down the processing speed, but the scheduler seems to ignore this parameter and continually tries to process the DAG file. This has the downstream consequence of making many GET requests which is hammering our internal API.

Note that there are other DAG files that the MWAA cluster is processing and respects the interval. There are DAG files in the same s3 "directory" that are being processed at the correct interval. It is only happening to this one file. Other than scheduler.min_file_process_interval and scheduler.parsing_processes, there are no other airflow configuration overrides.

What could be causing this?

rchui
已提问 1 年前316 查看次数
1 回答
0

The setting is scheduler.min_file_process_interval, not scheduler.file_min_process_interval

https://docs.aws.amazon.com/mwaa/latest/userguide/best-practices-tuning.html#best-practices-tuning-dag-folders

AWS
John_J
已回答 1 年前
  • Sorry John I do have it set as scheduler.min_file_process_interval. That was a typo on my part. Even if I had set the wrong setting, it should have still respected the 30 second wait time before reprocessing which it was not.

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则