MWAA Scheduler does not respect file_min_process_interval

0

We have a 2.0.2 MWAA cluster running in us-east-1. We recently tried to implement dynamic DAG generation instead of Jinja templating our DAG files. In order to do we are reading from os.environ and use the requests library to make a GET request to an internal API service that returns back configurations for the DAGs we need to generate.

Upon copying this file to s3, the DAG processor starts processing this new DAG file every second and ignores changes to the min_file_process_interval Airflow configuration. We've tried changing the value from 30 to 300 to slow down the processing speed, but the scheduler seems to ignore this parameter and continually tries to process the DAG file. This has the downstream consequence of making many GET requests which is hammering our internal API.

Note that there are other DAG files that the MWAA cluster is processing and respects the interval. There are DAG files in the same s3 "directory" that are being processed at the correct interval. It is only happening to this one file. Other than scheduler.min_file_process_interval and scheduler.parsing_processes, there are no other airflow configuration overrides.

What could be causing this?

rchui
posta un anno fa316 visualizzazioni
1 Risposta
0

The setting is scheduler.min_file_process_interval, not scheduler.file_min_process_interval

https://docs.aws.amazon.com/mwaa/latest/userguide/best-practices-tuning.html#best-practices-tuning-dag-folders

AWS
John_J
con risposta un anno fa
  • Sorry John I do have it set as scheduler.min_file_process_interval. That was a typo on my part. Even if I had set the wrong setting, it should have still respected the 30 second wait time before reprocessing which it was not.

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande