How to allow cloudwatch metrics (CPU util/MEM util) for MWAA base worker, additonal worker, scheduler and web server alone?

0

Hi, recently at my company, we started using MWAA and we noticed our CloudWatch bill is spiking a lot. I found that this might be due to metrics pushed from MWAA and I want to stop all the metrics except a few of them. I went through this doc which says how to choose the metrics: https://docs.aws.amazon.com/mwaa/latest/userguide/access-metrics-cw-202.html, but I'm not sure about the syntax/convention to mention the metrics, the given example just says scheduler,executor,dagrun. Does this mean I can only allow/disable these three metric names? If not, can anyone point me to the docs where I can find the list of metrics or is there any better way to handle this?

P.S. I also went through this question: https://repost.aws/questions/QUjdO4rRrBRwSEDMyxb4wJoA/mwaa-cloudwatch-metrics-billing-too-high but their solution was to disable all the metric while I want to retain some.

Edit: I did add the configuration with metrics.metrics_block_list as key but I can still see the metrics in cloudwatch Configuration option in MWAA

3 Answers
0

Hi, according to https://docs.aws.amazon.com/mwaa/latest/userguide/access-metrics-cw-202.html#choosing-metrics, if you set the configuration property metrics.metrics_allow_list with the value dagrun it will emit all the metrics whose name is prefixed with dagrun. Any other metric will be blocked and not be sent to CloudWatch.

On the same page, you can see the list of metrics whose name is prefixed with dagrun, there are 5 of them (dagrun.dependency-check.{dag_id}, dagrun.duration.failed.{dag_id}, dagrun.duration.success.{dag_id}, dagrun.schedule_delay.{dag_id} and dagrun.{dag_id}.first_task_scheduling_delay).

You should be able to use the list of metrics on that page to determine which metrics are the most relevant to you, and list their prefixes. For example, if the total parse time of an Airflow Gauge is important to you, the page lists dag_processing.total_parse_time as the metric name for it. Anything before the point is the prefix for it, so the prefix you would need to keep that metric is dag_processing. So to keep that metric you would set metrics.metrics_allow_list to be equal to (or to contain) the value dag_processing.

profile pictureAWS
Jsc
answered 8 months ago
profile pictureAWS
EXPERT
reviewed 8 months ago
  • Hi, thanks for trying to help. I did add the configuration property but still I end up seeing the metric in cloudwatch. Is there anything wrong with the way it is setup? I mentioned the values in a comma separated way as mentioned in the docs ( Edited my original question to add a screenshot )

0

Two questions:

  1. do you want to block only ti.finish? Did you try blocking the ti prefix (it should prevent both ti.finish and ti.start to be published)?
  2. when you say it didn't work, do you mean a new metric has been created with a name ti.finish.{dag_id}.{task_id}.{state} with a new value for dag_id, task_id and state?
profile pictureAWS
Jsc
answered 8 months ago
  • To answer your question,

    1. I tried setting metrics.metrics_allow_list as executor.queued_tasks,executor.running,scheduler.tasks.running_tasks as I decided the others are not needed at the moment.
    2. I cant validate if a new metric has been created as there are thousands of them at the moment but for the existing metric I see airflow is still pushing out data. EG : I see ti.finish metric get updated

    Does this mean I can only stop new metric creation and not stop airflow from pushing out data for existing metrics? In the another post( mwaa-cloudwatch-metrics-billing-too-high) I see that the original poster mentioned that they stopped all the metrics and the changes reflected within a day. But the post is about 9 months old and the commands they used seem to be different than in the docs so is there any new change that prevents stopping airflow from publishing metrics?

0

Per https://airflow.apache.org/docs/apache-airflow/2.6.3/configurations-ref.html#metrics-allow-list and https://docs.aws.amazon.com/mwaa/latest/userguide/access-metrics-cw-202.html#choosing-metrics you cannot allow underlying metrics such as executor.queued_tasks, you have to allow by group (i.e. to block all dagrun metrics output you would omit executor and set metrics.metrics_allow_list to scheduler,executor).

For list of metrics see https://airflow.apache.org/docs/apache-airflow/2.6.3/administration-and-deployment/logging-monitoring/metrics.html

AWS
John_J
answered 8 months ago
  • Thanks for pointing out that I cannot allow underlying metrics but can I block TaskInstance or ti metrics like ti.finish.

    Because from the docs and from your comment I only see executor, schedulor and dagrun.

    Are these three the only available prefixes which can be blocked? or can I block prefixes like pool, ti, dag_processing and so on which are mentioned under Apache Airflow metric column in https://docs.aws.amazon.com/mwaa/latest/userguide/access-metrics-cw-202.html#choosing-metrics ?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions