Missing metric value on MSK Prometheus

0

We have a MSK cluster with prometheus monitoring enabled. We are monitoring the consumer lag on different topics on a grafana dashboard. The metrics are collected by a prometheus docker every minute and sent to our prometheus workspace. I've noticed that the exporter does not seem to give us the value of the lag everytime we call it. As you can see in this screenshot, some values of the consumer lag seem to be missing. This is a problem since the consumer lag is a critical metric which we need to be aware of anytime. We are using the metric kafka_consumer_group_ConsumerLagMetrics_Value to get this value. Even by doing a curl on the MSK exporter url, the call sometimes returns a value and sometime does not (when grep on a specific GroupId that we want to monitor).

Enter image description here

1回答
0

Hi, can you please check the below points to troubleshoot this issue?

  1. If this is happening only for consumer lag metrics, please check if the consumer groups are active and running consistently. Inactive or unstable consumers may lead to missing metrics.
  2. Did you check the logs of the prometheus exporter and see if there are any errors?
  3. If it is due to resource constraint, adjust the timeout and scrape internal for reliable data collection.
  4. Please check the scrape timeout configuration on the prometheus configuration. If the default timeout is too low for the metrics collection, please increase and validate the metrics collection.
profile pictureAWS
回答済み 7ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ