We have a 2.0.2 MWAA cluster running in us-east-1. We recently tried to implement dynamic DAG generation instead of Jinja templating our DAG files. In order to do we are reading from os.environ
and use the requests library to make a GET request to an internal API service that returns back configurations for the DAGs we need to generate.
Upon copying this file to s3, the DAG processor starts processing this new DAG file every second and ignores changes to the min_file_process_interval
Airflow configuration. We've tried changing the value from 30
to 300
to slow down the processing speed, but the scheduler seems to ignore this parameter and continually tries to process the DAG file. This has the downstream consequence of making many GET requests which is hammering our internal API.
Note that there are other DAG files that the MWAA cluster is processing and respects the interval. There are DAG files in the same s3 "directory" that are being processed at the correct interval. It is only happening to this one file. Other than scheduler.min_file_process_interval
and scheduler.parsing_processes
, there are no other airflow configuration overrides.
What could be causing this?
Sorry John I do have it set as
scheduler.min_file_process_interval
. That was a typo on my part. Even if I had set the wrong setting, it should have still respected the 30 second wait time before reprocessing which it was not.