Intermittent DatasourceNoData alerts in Managed Grafana for a Cloudwatch data source

0

Summary: I get intermittent DatasourceNoData alerts for a Cloudwatch data source in Managed Grafana, but I don't get those errors from (almost) identical alerts in Grafana hosted on an EC2 instance.

Background

I have a Amazon Managed Service for Grafana (AMG) workspace where I have configured a an AWS Cloudwatch data source, according to AWS documentation Use AWS data source configuration to add CloudWatch as a data source. I have set up a number of alerts using my Cloudwatch data source monitoring the CPUUtilization for a number of ECS services, where the alert triggers if the CPU average for the last 2 minutes exceeds a certain number. I have similar alerts for memory utilization. The alerts evaluates every 30 seconds, with 4 minutes as a pending state. I have 14 such alerts monitoring CPU utilization and memory management for 12 services.

I also have an older Grafana service running on an EC2 instance. It has Cloudwatch as a datasource and the same alerts with the same config, with the only difference that there are fewer alerts, they only cover two services instead of twelve. The AMG workspace is meant to replace this old Grafana instance.

Problem

Since creating the alerts in the new AMG workspace a few weeks ago, some or all of the alerts fires with the state NoData, and then resolves within a few minutes. There does not seem to be a problem with the services that the alerts monitor. This has never happened to alerts in the old Grafana instance, even though they are connected to the same Cloudwatch data source and the alerts are similar (but fewer).

I have looked into if they exceed an AWS service quota or something similar, but since they are connected to the same AWS accout, shouldn't that result in the same error happening to the alerts in the old Grafana instance sometimes, not always the alerts in the new AMG workspace?

What could be the problem and how can I debug this without access to logs in Managed Grafana? I'm grateful for any suggestions.

LilyB
asked 4 months ago954 views
1 Answer
0

Good day,

From your description, it does not sound like a quota related issue.

I would suggest checking for data points for the same metric within the CloudWatch console. Are you able to see datapoints for the same metric/ resources for the time NoData errors show up in Grafana? If both the graphs in Grafana and CloudWatch match, then the service that the metric belongs to is not sending the metric data. Since CloudWatch is a push based monitoring service which will not poll for any data from any of it's source.

AWS
SUPPORT ENGINEER
answered 3 months ago
  • Thanks for your suggestion! I checked the affected metrics in CloudWatch (AWS Console => ClousWatch => Metrics) for some of the time intervals where the alerts fired with due to DatasourceNoData, but could not see anything suggesting that the metrics did not exist for those intervals. Also, I have identical alerts setup in the old Grafana instance hosted in EC2 I am hoping to migrate from, and those did not fire, which suggests that it isn't the metric in itself that is the problem.

    The only thing I know differs between the new Managed Grafana instance and the old Grafana instance in EC2 is that the new instance has alerts covering more ECS services, that's why I was thinking limits for number of connections or similar could be the culprit.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions