Cloudwatch Metrics agent ignores /var/lib/docker even though it's configured

0

I have two EC2 instances with the amazon-cloudwatch-agent installed and configured. They work as expected except that on the CloudWatch metrics console, I do not see disk utilization info for /var/lib/docker. I have 13 docker containers at that mount point, but I'm unable to get any metrics on them.

In

/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml 

I have this:

[[inputs.disk]]
    fieldpass = ["used_percent", "inodes_free"]
    interval = "10s"
    mount_points = ["/dev", "/dev/shm", "/run", "/run/user/1000", "/sys/fs/cgroup", "/", "/var/lib/docker"]
    tagexclude = ["mode"]
    [inputs.disk.tags]
      "aws:StorageResolution" = "true"
      metricPath = "metrics"

in the metrics config file I have this:

"metrics": {
                "metrics_collected": {
                        "disk": {
                                "measurement": [
                                        "used_percent",
                                        "inodes_free"
                                ],
                                "metrics_collection_interval": 10,
                                "resources": [
                                "/dev",
                                        "/dev/shm",
                                        "/run",
                                        "/run/user/1000",
                                        "/sys/fs/cgroup",
                                        "/",
                                        "/var/lib/docker"
                                ]
                        },

I stopped the agent service and started it with

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/path/to/my/config/file

On the metrics cloud console, either under All Metrics or in a dashboard widget that I created, in the CWAgent namespace, I see all the metrics I configured except the disk utilization for /var/lib/docker.

Is that excluded on purpose for some reason? It's the only disk utilization metric I actually want to monitor, and it's the only configured metric not available.

mleone
asked a year ago628 views
3 Answers
0
Accepted Answer

Is /var/lib/docker it's own filesystem or is it part of the root fileysstem?

According to the Linux section of https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html

 resources – Optional. Specifies an array of disk mount points. This field limits CloudWatch to collect metrics from only the listed mount points. You can specify * as the value to collect metrics from all mount points. The default value is to collect metrics from all mount points.

If you replace your list of resources with * instead, does that now include /var/lib/docker in the output?

profile picture
EXPERT
Steve_M
answered a year ago
  • Thanks for the suggestion. A wildcard value for resources didn't make any difference. See my conclusion in my answer below.

  • I accepted this answer because the fact that all the docker mount points are on the root file system is the explanation for what I'm seeing, and there's nothing that the cloudwatch metrics agent can do differently.

0

Setting resources to ["*"] didn't make any difference. I think it's just not possible to collect metrics separately on individual mount points of an overlay file system. I noticed that when I run df on the file system, each docker container mount is shown to have the same utilization percentage as the rolled-up value. To wit.

$ sudo df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        2.0G     0  2.0G   0% /dev
tmpfs           2.0G     0  2.0G   0% /dev/shm
tmpfs           2.0G  1.4M  2.0G   1% /run
tmpfs           2.0G     0  2.0G   0% /sys/fs/cgroup
/dev/xvda1       32G   25G  7.1G  78% /
overlay          32G   25G  7.1G  78% /var/lib/docker/overlay2/fcc41653c5c01453bbd27adda4ff6014cd0c2664efbfb7d0830e636eacf9668b/merged
overlay          32G   25G  7.1G  78% /var/lib/docker/overlay2/45a8c22e6a157f3a945b8bc419a6e48333edfcdb1dcee771537477f2e3354803/merged
overlay          32G   25G  7.1G  78% /var/lib/docker/overlay2/606fddf437692ef6a45fbe90c1f19da5789f9ca877ece69607de7689c20215c3/merged
overlay          32G   25G  7.1G  78% /var/lib/docker/overlay2/ad0bedf886c9fe2d760f27e9a07206f469d54399a91f5a2348c6a869467b6517/merged
overlay          32G   25G  7.1G  78% /var/lib/docker/overlay2/5b5a6d3e1048fe37f78c1cd61cf17290bf3424c9c4e358b15a532b6f22c00499/merged
overlay          32G   25G  7.1G  78% /var/lib/docker/overlay2/0d0d1b2e2707e4c8bc734a6f2fe34fe0ad1f318fe012f7d1e07b8d4c7ba9b638/merged
overlay          32G   25G  7.1G  78% /var/lib/docker/overlay2/f1a53eef4a0c1bead31a38cae74dd9ced3d65a4a0bfc18382bc8c17b3a105349/merged
overlay          32G   25G  7.1G  78% /var/lib/docker/overlay2/f58637dd7d9c4a27dc99cd4abdfd237fce93f9e2a7ae1cb8666294501ce8eaf1/merged
overlay          32G   25G  7.1G  78% /var/lib/docker/overlay2/1c7a698c69b15ae8835a864de0f755c2fc1bcc2eec3b3427d6d05974ce228f84/merged
overlay          32G   25G  7.1G  78% /var/lib/docker/overlay2/2656b12ea0a5b0c4289ae89ea162bdabdefab129ee73cdeeb3589f577980d455/merged
overlay          32G   25G  7.1G  78% /var/lib/docker/overlay2/5a28b1872fff2a94befacca38de51f2f61ea19ed7036d975e8765d696b680b2c/merged
overlay          32G   25G  7.1G  78% /var/lib/docker/overlay2/cb584f71cc5b5dd1b397be9606cf70eafccf6d9585a9c1a78b7a0d4da08e1451/merged
overlay          32G   25G  7.1G  78% /var/lib/docker/overlay2/464e9c7c9f58aa84b59cbdb8e9f25cc49a4eae6cf7a80e7e0c4365e4acb803d7/merged
tmpfs           392M     0  392M   0% /run/user/1000

So reporting metrics on all those mount points separately would be meaningless anyway.

I changed my dashboard to just report the percent used and the number of free inodes as gauges. I can alarm on those values and investigate manually when I reach a warning threshold.

mleone
answered a year ago
  • It's what I thought in the first sentence of my initial reply - you only actually have one filesystem on a disk, which is the root filesystem.

    The overlay filesystems under /var/lib/docker/* are actually just the root filesystem mounted in 13 other places (it's more complicated than that but there isn't the space to explain it here). All the metrics associated with these are the same as for the root filesystem, because it's the same filesystem, so things like total number of blocks / % used / free i-nodes / blocks read & written per second / and so on would just be the same figure repeated 14 times.

    Also be aware that if one of your Docker instances runs away and starts consuming huge amounts of disk, it will fill up its disk which will also fill up the disk available to all of the other containers, as well as the root filesystem itself. The reverse is also true, the containers could all be behaving themselves but something in the OS starts logging thousands of messages per second to /var/log, eventually it will fill up the root filesystem, which will similarly impact Docker.

    The top four and last row of your output are all tmpfs filesystems. These live in memory and not on disk, and I'm not sure there's much value in having CloudWatch report on these (though there's no harm in doing so).

0

Hello,

I would like to mention that the CloudWatch agent will look for the file - end of directory tree, please refer the following document for more information

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html#CloudWatch-Agent[…]-File-Logssection docs.aws.amazon.comdocs.aws.amazon.com Manually create or edit the CloudWatch agent configuration file - Amazon CloudWatch Explains how to manually create the CloudWatch agent configuration file, including the sections and settings inside the CloudWatch agent configuration file.

file_path – Specifies the path of the log file to upload to CloudWatch Logs. Standard Unix glob matching rules are accepted, with the addition of ** as a super asterisk. For example, specifying /var/log/**.log causes all .log files in the /var/log directory tree to be collected. For more examples, see Glob Library.

You can also use the standard asterisk as a standard wildcard. For example, /var/log/system.log* matches files such as system.log_1111, system.log_2222, and so on in /var/log.

Only the latest file is pushed to CloudWatch Logs based on file modification time. We recommend that you use wildcards to specify a series of files of the same type, such as access_log.2018-06-01-01 and access_log.2018-06-01-02, but not multiple kinds of files, such as access_log_80 and access_log_443. To specify multiple kinds of files, add another log stream entry to the agent configuration file so that each kind of log file goes to a different log stream.

From the correspondences shared I also understand that you have made the same observation and determined that there isn't much value to logging the same as the overlay filesystems under /var/lib/docker/* are actually just the root filesystem mounted in other places as depicted based on the screenshot shared by you.

I hope you find the above information useful. Please feel free to write back to me on this case if you have further doubts and queries. I will be happy to assist you.

Stay safe and have a nice day.

AWS
SUPPORT ENGINEER
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions