Agent configuration for Kafka through jmx not working

0

Hi,

I am facing an issue related to the Kafka monitoring. As per this document https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Solution-Kafka-On-EC2.html, I have configured the jmx configuration.

While setting up the configuration for jmx in the CloudWatch agent configuration file, I am getting an error:

2025/04/25 13:14:46 Under path: /metrics/metrics_collected/jmx | Error: Invalid type. Expected: object, given: array 2025/04/25 13:14:46 Configuration validation first phase failed. Agent version: 1.0. Verify the JSON input is only using features supported by this version.

Can someone please help?

Here is my agent conf file..

{
  "agent": {
    "metrics_collection_interval": 60,
    "debug": true,
    "aws_sdk_log_level": "LogDebug"
  },
  "logs": {
    "logs_collected": {
      "windows_events": {
        "collect_list": [
          {
            "event_format": "text",
            "event_levels": [
              "WARNING",
              "ERROR",
              "CRITICAL"
            ],
            "event_name": "System",
            "log_group_name": "Windows-SystemEvent",
            "log_stream_name": "{instance_id}"
          }
        ]
      }
    }
  },
  "metrics": {
    "aggregation_dimensions": [
      [
        "InstanceId"
      ]
    ],
    "append_dimensions": {
      "ImageId": "${aws:ImageId}",
      "InstanceId": "${aws:InstanceId}",
      "InstanceType": "${aws:InstanceType}"
    },
    "metrics_collected": {
      "jmx": [
        {
          "endpoint": "localhost:9999",
          "kafka": {
            "measurement": [
              "kafka.message.count",
              "kafka.request.time.99p",
              "kafka.request.time.50p",
              "kafka.request.time.total",
              "kafka.request.time.avg",
              "kafka.request.failed",
              "kafka.request.count",
              "kafka.purgatory.size",
              "kafka.partition.count",
              "kafka.unclean.election.rate",
              "kafka.max.lag",
              "kafka.controller.active.count",
              "kafka.request.queue",
              "kafka.partition.under_replicated",
              "kafka.partition.offline",
              "kafka.network.io",
              "kafka.leader.election.rate",
              "kafka.isr.operation.count",
              "kafka.logs.flush.time.count",
              "kafka.logs.flush.time.median",
              "kafka.logs.flush.time.99p"
            ]
          },
          "append_dimensions": {
            "ClusterName": "kafka-cluster"
          }
        },
        {
          "endpoint": "localhost:9999",
          "kafka-producer": {
            "measurement": [
              "kafka.producer.io-wait-time-ns-avg",
              "kafka.producer.outgoing-byte-rate",
              "kafka.producer.compression-rate",
              "kafka.producer.request-rate",
              "kafka.producer.byte-rate",
              "kafka.producer.request-latency-avg",
              "kafka.producer.response-rate",
              "kafka.producer.record-error-rate",
              "kafka.producer.record-send-rate",
              "kafka.producer.record-retry-rate"
            ]
          },
          "append_dimensions": {
            "ProducerGroupName": "kafka-producer"
          }
        },
        {
          "endpoint": "localhost:9999",
          "kafka-consumer": {
            "measurement": [
              "kafka.consumer.fetch-rate",
              "kafka.consumer.total.fetch-size-avg",
              "kafka.consumer.total.records-consumed-rate",
              "kafka.consumer.fetch-size-avg",
              "kafka.consumer.total.bytes-consumed-rate",
              "kafka.consumer.records-consumed-rate",
              "kafka.consumer.bytes-consumed-rate",
              "kafka.consumer.records-lag-max"
            ]
          },
          "append_dimensions": {
            "ConsumerGroupName": "kafka-consumer"
          }
        },
        {
          "endpoint": "localhost:9999",
          "jvm": {
            "measurement": [
              "jvm.classes.loaded",
              "jvm.gc.collections.count",
              "jvm.gc.collections.elapsed",
              "jvm.memory.heap.init",
              "jvm.memory.heap.committed",
              "jvm.memory.heap.max",
              "jvm.memory.heap.used",
              "jvm.memory.nonheap.init",
              "jvm.memory.nonheap.committed",
              "jvm.memory.nonheap.max",
              "jvm.memory.nonheap.used",
              "jvm.threads.count",
              "jvm.memory.pool.init",
              "jvm.memory.pool.max",
              "jvm.memory.pool.used",
              "jvm.memory.pool.committed"
            ]
          },
          "append_dimensions": {
            "ProcessGroupName": "kafka-jvm"
          }
        }
	  ],
      "LogicalDisk": {
        "measurement": [
          {
            "name": "% Free Space",
            "unit": "Percent"
          }
        ],
        "resources": [
          "*"
        ],
        "metrics_collection_interval": 60
      },
      "Memory": {
        "measurement": [
          "% Committed Bytes In Use"
        ],
        "metrics_collection_interval": 60
      },
      "Paging File": {
        "measurement": [
          "% Usage"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ]
      },
      "PhysicalDisk": {
        "measurement": [
          "% Disk Time",
          "Disk Write Bytes/sec",
          "Disk Read Bytes/sec",
          "Disk Writes/sec",
          "Disk Reads/sec"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ]
      },
      "Processor": {
        "measurement": [
          "% User Time",
          "% Idle Time",
          "% Interrupt Time"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "_Total"
        ]
      },
      "Network Interface": {
        "measurement": [
          "Bytes Sent/sec",
          "Bytes Received/sec",
          "Packets Sent/sec",
          "Packets Received/sec"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ]
      },
      "TCPv4": {
        "measurement": [
          "Connections Established"
        ],
        "metrics_collection_interval": 60
      },
      "TCPv6": {
        "measurement": [
          "Connections Established"
        ],
        "metrics_collection_interval": 60
      }
    }
  }
}
2 Answers
0
Accepted Answer

I have solved it by updating the CloudWatch agent.

CloudWatch Agent Version should be 1.300055.0b1095 or above. Use the same config above and expose the JMX Metrics as mentioned in this doc: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Solution-Kafka-On-EC2.html

answered 15 days ago
0

The error you're encountering is due to the structure of your JMX configuration in the CloudWatch agent configuration file. The issue is that the "jmx" field is expecting an object, but you've provided an array.

To resolve this, you need to modify the structure of your "jmx" configuration. Instead of using an array, you should use an object with a single key (e.g., "metrics_collection") that contains the array of JMX configurations.

Here's how you can modify the "jmx" section of your configuration:

"metrics_collected": {
  "jmx": {
    "metrics_collection": [
      {
        "endpoint": "localhost:9999",
        "kafka": {
          "measurement": [
            "kafka.message.count",
            ...
          ]
        },
        "append_dimensions": {
          "ClusterName": "kafka-cluster"
        }
      },
      {
        "endpoint": "localhost:9999",
        "kafka-producer": {
          "measurement": [
            "kafka.producer.io-wait-time-ns-avg",
            ...
          ]
        },
        "append_dimensions": {
          "ProducerGroupName": "kafka-producer"
        }
      },
      ...
    ]
  },
  "LogicalDisk": {
    ...
  },
  ...
}

This change should resolve the configuration validation error you're seeing. After making this modification, try restarting the CloudWatch agent. If you're still encountering issues, double-check that the rest of your configuration file is correctly formatted and that you're using the latest version of the CloudWatch agent.

Remember to use the fetch-config option when restarting the agent to ensure it picks up your new configuration:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/path/to/your/config.json

If problems persist, review the CloudWatch agent logs for any additional error messages that might provide more insight into the issue.
Sources
Issue with CloudWatch agent configuration | AWS re:Post
Collect Java Management Extensions (JMX) metrics - Amazon CloudWatch
Troubleshooting the CloudWatch agent - Amazon CloudWatch

profile picture
answered 20 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions