MWAA Scheduler does not respect file_min_process_interval

0

We have a 2.0.2 MWAA cluster running in us-east-1. We recently tried to implement dynamic DAG generation instead of Jinja templating our DAG files. In order to do we are reading from os.environ and use the requests library to make a GET request to an internal API service that returns back configurations for the DAGs we need to generate.

Upon copying this file to s3, the DAG processor starts processing this new DAG file every second and ignores changes to the min_file_process_interval Airflow configuration. We've tried changing the value from 30 to 300 to slow down the processing speed, but the scheduler seems to ignore this parameter and continually tries to process the DAG file. This has the downstream consequence of making many GET requests which is hammering our internal API.

Note that there are other DAG files that the MWAA cluster is processing and respects the interval. There are DAG files in the same s3 "directory" that are being processed at the correct interval. It is only happening to this one file. Other than scheduler.min_file_process_interval and scheduler.parsing_processes, there are no other airflow configuration overrides.

What could be causing this?

rchui
asked a year ago302 views
1 Answer
0

The setting is scheduler.min_file_process_interval, not scheduler.file_min_process_interval

https://docs.aws.amazon.com/mwaa/latest/userguide/best-practices-tuning.html#best-practices-tuning-dag-folders

AWS
John_J
answered a year ago
  • Sorry John I do have it set as scheduler.min_file_process_interval. That was a typo on my part. Even if I had set the wrong setting, it should have still respected the 30 second wait time before reprocessing which it was not.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions