Missing metric value on MSK Prometheus

0

We have a MSK cluster with prometheus monitoring enabled. We are monitoring the consumer lag on different topics on a grafana dashboard. The metrics are collected by a prometheus docker every minute and sent to our prometheus workspace. I've noticed that the exporter does not seem to give us the value of the lag everytime we call it. As you can see in this screenshot, some values of the consumer lag seem to be missing. This is a problem since the consumer lag is a critical metric which we need to be aware of anytime. We are using the metric kafka_consumer_group_ConsumerLagMetrics_Value to get this value. Even by doing a curl on the MSK exporter url, the call sometimes returns a value and sometime does not (when grep on a specific GroupId that we want to monitor).

Enter image description here

1 Answer
0

Hi, can you please check the below points to troubleshoot this issue?

  1. If this is happening only for consumer lag metrics, please check if the consumer groups are active and running consistently. Inactive or unstable consumers may lead to missing metrics.
  2. Did you check the logs of the prometheus exporter and see if there are any errors?
  3. If it is due to resource constraint, adjust the timeout and scrape internal for reliable data collection.
  4. Please check the scrape timeout configuration on the prometheus configuration. If the default timeout is too low for the metrics collection, please increase and validate the metrics collection.
profile pictureAWS
answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions