MWAA Scheduler does not respect file_min_process_interval

0

We have a 2.0.2 MWAA cluster running in us-east-1. We recently tried to implement dynamic DAG generation instead of Jinja templating our DAG files. In order to do we are reading from os.environ and use the requests library to make a GET request to an internal API service that returns back configurations for the DAGs we need to generate.

Upon copying this file to s3, the DAG processor starts processing this new DAG file every second and ignores changes to the min_file_process_interval Airflow configuration. We've tried changing the value from 30 to 300 to slow down the processing speed, but the scheduler seems to ignore this parameter and continually tries to process the DAG file. This has the downstream consequence of making many GET requests which is hammering our internal API.

Note that there are other DAG files that the MWAA cluster is processing and respects the interval. There are DAG files in the same s3 "directory" that are being processed at the correct interval. It is only happening to this one file. Other than scheduler.min_file_process_interval and scheduler.parsing_processes, there are no other airflow configuration overrides.

What could be causing this?

rchui
demandé il y a un an316 vues
1 réponse
0

The setting is scheduler.min_file_process_interval, not scheduler.file_min_process_interval

https://docs.aws.amazon.com/mwaa/latest/userguide/best-practices-tuning.html#best-practices-tuning-dag-folders

AWS
John_J
répondu il y a un an
  • Sorry John I do have it set as scheduler.min_file_process_interval. That was a typo on my part. Even if I had set the wrong setting, it should have still respected the 30 second wait time before reprocessing which it was not.

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions