Skip to content

Cloudwatch logs not appearing despite Cloudwatch EC2 agent seeming to work

0

Hi,

I am trying to do something basic: pipe logs from a process running on an EC2 instance to Cloudwatch. I followed the Cloudwatch tutorials for installing/configuring/starting an agent on the EC2 instance, setting up an IAM role for the instance, etc, but:

  • No log group with the log group name specified in the agent config appears in the Cloudwatch console (I have checked that the IAM role includes permission to create log groups and streams)
  • After manually creating the log group (and stream), no logs appear in the console I have checked that I am looking at the correct region in my console.

The Cloudwatch agent produces some logs which do contain some errors where the agent does not have local permissions to determine disk usage, but I don't see how that can be related to the missing logs. I have also verified that the agent has access to the log file that it is supposed to be tailing. Finally, the EC2 instances have outbound traffic permissions, since they need to talk to the wider internet.

Does anyone have any ideas to troubleshoot?

Here are the logs produced by the agent:

2024/07/23 08:00:00 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2024/07/23 08:00:00 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2024/07/23 08:00:01 I! Valid Json input schema.
2024/07/23 08:00:01 I! Detected runAsUser: cwagent
2024/07/23 08:00:01 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 997:997
2024/07/23 08:00:01 I! Set HOME: /home/cwagent
2024-07-23T08:00:01Z I! Starting AmazonCloudWatchAgent CWAgent/1.300042.0b733 (go1.22.5; linux; amd64) with log file /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log with log target lumberjack
2024-07-23T08:00:01Z I! AWS SDK log level not set
2024-07-23T08:00:01Z I! creating new logs agent
2024-07-23T08:00:01Z I! [logagent] starting
2024-07-23T08:00:01Z I! [logagent] found plugin cloudwatchlogs is a log backend
2024-07-23T08:00:01Z I! [logagent] found plugin logfile is a log collection
2024-07-23T08:00:01Z I! [logagent] start logs plugin file paths [/var/lib/docker/containers/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f-json.log]
2024-07-23T08:00:01Z I! [inputs.logfile] turned on logs plugin
2024-07-23T08:00:01Z I! {"caller":"service@v0.98.0/telemetry.go:47","msg":"Skipping telemetry setup.","address":"","level":"None"}
2024-07-23T08:00:01Z I! {"caller":"awsxrayreceiver@v0.98.0/receiver.go:45","msg":"Going to listen on endpoint for X-Ray segments","kind":"receiver","name":"awsxray","data_type":"traces","udp":"127.0.0.1:2000"}
2024-07-23T08:00:01Z I! {"caller":"udppoller/poller.go:95","msg":"Listening on endpoint for X-Ray segments","kind":"receiver","name":"awsxray","data_type":"traces","udp":"127.0.0.1:2000"}
2024-07-23T08:00:01Z I! {"caller":"awsxrayreceiver@v0.98.0/receiver.go:56","msg":"Listening on endpoint for X-Ray segments","kind":"receiver","name":"awsxray","data_type":"traces","udp":"127.0.0.1:2000"}
2024-07-23T08:00:01Z I! {"caller":"service@v0.98.0/service.go:143","msg":"Starting CWAgent...","Version":"1.300042.0b733","NumCPU":8}
2024-07-23T08:00:01Z I! {"caller":"extensions/extensions.go:34","msg":"Starting extensions..."}
2024-07-23T08:00:01Z I! {"caller":"extensions/extensions.go:37","msg":"Extension is starting...","kind":"extension","name":"agenthealth/traces"}
2024-07-23T08:00:01Z I! {"caller":"extensions/extensions.go:52","msg":"Extension started.","kind":"extension","name":"agenthealth/traces"}
2024-07-23T08:00:01Z I! {"caller":"extensions/extensions.go:37","msg":"Extension is starting...","kind":"extension","name":"agenthealth/metrics"}
2024-07-23T08:00:01Z I! {"caller":"extensions/extensions.go:52","msg":"Extension started.","kind":"extension","name":"agenthealth/metrics"}
2024-07-23T08:00:01Z I! cloudwatch: get unique roll up list [[InstanceId]]
2024-07-23T08:00:01Z I! {"caller":"ec2tagger/ec2tagger.go:415","msg":"ec2tagger: Check EC2 Metadata.","kind":"processor","name":"ec2tagger","pipeline":"metrics/host"}
2024-07-23T08:00:01Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 42.599937511s
2024-07-23T08:00:01Z I! {"caller":"ec2tagger/ec2tagger.go:333","msg":"ec2tagger: EC2 tagger has started initialization.","kind":"processor","name":"ec2tagger","pipeline":"metrics/host"}
2024-07-23T08:00:01Z I! Started the statsd service on :8125
2024-07-23T08:00:01Z I! {"caller":"awsxrayreceiver@v0.98.0/receiver.go:90","msg":"X-Ray TCP proxy server started","kind":"receiver","name":"awsxray","data_type":"traces"}
2024-07-23T08:00:01Z I! {"caller":"service@v0.98.0/service.go:169","msg":"Everything is ready. Begin running and processing data."}
2024-07-23T08:00:01Z W! {"caller":"localhostgate/featuregate.go:63","msg":"The default endpoints for all servers in components will change to use localhost instead of 0.0.0.0 in a future version. Use the feature gate to preview the new default.","feature gate ID":"component.UseLocalHostAsDefaultHost"}
2024-07-23T08:00:01Z I! Statsd listener listening on:  [::]:8125
2024-07-23T08:00:01Z I! {"caller":"ec2tagger/ec2tagger.go:480","msg":"ec2tagger: Initial retrieval of tags succeeded","kind":"processor","name":"ec2tagger","pipeline":"metrics/host"}
2024-07-23T08:00:01Z I! {"caller":"ec2tagger/ec2tagger.go:391","msg":"ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes","kind":"processor","name":"ec2tagger","pipeline":"metrics/host"}
2024-07-23T08:00:02Z E! [inputs.disk] [SystemPS] => error getting disk usage ("/var/lib/docker/overlay2/bdd6eaaf2eefb158c14e053fd49308a08009c83e7fd6dc622f4cef80702be676/merged"): permission denied
2024-07-23T08:00:02Z E! [inputs.disk] [SystemPS] => error getting disk usage ("/run/docker/netns/daa3427e69e9"): permission denied

Here is my config:

{
	"agent": {
		"metrics_collection_interval": 60,
		"run_as_user": "cwagent"
	},
	"logs": {
		"logs_collected": {
			"files": {
				"collect_list": [
					{
						"file_path": "/var/lib/docker/containers/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f-json.log",
						"log_group_class": "STANDARD",
						"log_group_name": "reconstruction-worker-test-log",
						"log_stream_name": "{instance_id}",
						"retention_in_days": -1
					}
				]
			}
		}
	},
	"metrics": {
		"aggregation_dimensions": [
			[
				"InstanceId"
			]
		],
		"append_dimensions": {
			"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
			"ImageId": "${aws:ImageId}",
			"InstanceId": "${aws:InstanceId}",
			"InstanceType": "${aws:InstanceType}"
		},
		"metrics_collected": {
			"disk": {
				"measurement": [
					"used_percent"
				],
				"metrics_collection_interval": 60,
				"resources": [
					"*"
				]
			},
			"mem": {
				"measurement": [
					"mem_used_percent"
				],
				"metrics_collection_interval": 60
			},
			"statsd": {
				"metrics_aggregation_interval": 60,
				"metrics_collection_interval": 10,
				"service_address": ":8125"
			}
		}
	},
	"traces": {
		"buffer_size_mb": 3,
		"concurrency": 8,
		"insecure": false,
		"traces_collected": {
			"xray": {
				"bind_address": "127.0.0.1:2000",
				"tcp_proxy": {
					"bind_address": "127.0.0.1:2000"
				}
			}
		}
	}
}
1 Answer
0

I figured out the problem, which is that the Cloudwatch agent did not have the right permissions to read the log file that it was supposed to be tailing. It was tricky because the file itself had 0777 permissions, but the ancestors of the file did not have the right permissions.

I figured this out by turning on debug mode in the agent section of the amazon-cloudwatch-agent.json config as suggested by this doc: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/troubleshooting-CloudWatch-Agent.html#CloudWatch-Agent-troubleshooting-loginfo

The logs indicated that the file that the agent was supposed to be tailing could not be accessed by stat:

2024-07-23T08:38:25Z D! [logagent] open file count, 0
2024-07-23T08:38:25Z D! Stat file /var/lib/docker/containers/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f-json.log failed due to stat /var/lib/docker/containers/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f-json.log: permission denied
2024-07-23T08:38:26Z D! [logagent] open file count, 0
2024-07-23T08:38:26Z D! Stat file /var/lib/docker/containers/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f-json.log failed due to stat /var/lib/docker/containers/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f-json.log: permission denied
2024-07-23T08:38:27Z D! [logagent] open file count, 0
2024-07-23T08:38:27Z D! Stat file /var/lib/docker/containers/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f-json.log failed due to stat /var/lib/docker/containers/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f/d93b1346b7e8553117508f4e1f80590edb6040c93f81dd58dec5b2930e672f1f-json.log: permission denied
2024-07-23T08:38:27Z D! {"caller":"adapter/receiver.go:61","msg":"Begin scraping metrics with adapter","kind":"receiver","name":"telegraf_statsd","data_type":"metrics","receiver":"statsd"}
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.