Two identically configured Elastic Beanstalk environments, log streaming works in one but not the other

0

This is a copy of a question I asked earlier on Stack Overflow. Hoping maybe I can get some useful responses here. Edits: formatting.

I have a node.js application running in docker, deployed to an Elastic Beanstalk cluster via ECS. This application has two environments, call them "stage" and "prod". Both environments are configured to stream (non-custom) instance logs to cloudwatch with identical security policies in place. Log streaming works correctly in one environment ("stage") while the other ("prod") does not stream to cloudwatch (groups and streams are created but no events are ever written) and logs instead get written to disk on each EC2 instance.

I have verified the following are true for both environments:

  1. Both environments are in the same region (us-east-1)

  2. Identical platform and version (Docker on Amazon Linux 2/3.0.0).

  3. The Instance log streaming to CloudWatch Logs option enabled in the Software section of the configuration tab on the EB web console

  4. Identical settings for Retention (3 days) and Lifecycle (Delete logs upon termination).

  5. Code deployed (a public-facing GraphQL API if that matters) which writes a lot of logging output to the console via console.debug, console.info and friends.

  6. Custom Service Role set on the Security section of the EB console's configuration tab. Both service roles resolve to the IAM role set as the instance profile.

  7. Custom IAM Instance Profile IAM roles with the identical permission, trust relationships, and permission policies as below:

Trusted entities
The identity provider(s) ec2.amazonaws.com
The identity provider(s) elasticbeanstalk.amazonaws.com 

Condition 		Key 			Value
StringEquals 	sts:ExternalId 	elasticbeanstalk 

Permissions policies

AmazonEC2ContainerRegistryReadOnly
AWSElasticBeanstalkEnhancedHealth
AWSElasticBeanstalkWebTier
AWSElasticBeanstalkMulticontainerDocker
AmazonEC2ContainerRegistryPowerUser
AWSElasticBeanstalkWorkerTier
sns-topic-publish-allow-policy
cloudwatch-allow-policy
AWSElasticBeanstalkManagedUpdatesCustomerRolePolicy


cloudwatch-allow-policy policy document:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "logs:PutLogEvents",
                "logs:DescribeLogStreams",
                "logs:DescribeLogGroups",
                "logs:CreateLogStream"
            ],
            "Resource": "*"
        }
    ]
}

Both environments otherwise run correctly, sit at Green/OK Health status, and report no permission problems. Differences are that 'stage' is not load balanced or scaled and runs on a smaller instance size. Prod has load balancing and scaling (which I'm assuming is irrelevant but I can share details on that if it is).

Expected behavior - stage

When the application deployed to the stage environment writes something to the console, it appears as an event in a Cloudwatch stream named /aws/elasticbeanstalk/stage/var/log/eb-docker/containers/eb-current-app/stdouterr.log > %EC2-INSTANCE-ID% as I expect it to. If I ssh into the instance that wrote to the log, there is nothing written on disk under /var/log/eb-docker/containers/eb-current-app which is also expected.

Observed behavior - prod

When the application deployed to the prod environment writes something to the console on the other hand, nothing is written to cloudwatch. Cloudwatch log groups appear named /aws/elasticbeanstalk/prod/var/log/eb-docker/containers/eb-current-app/stdouterr.log > %EC2-INSTANCE-ID% but no events are ever logged. If I ssh into the instance that wrote to the log, the text logged appears on disk under /var/log/eb-docker/containers/eb-current-app/eb-%SOME_HASH%-stdouterr.log and if the Instance log streaming to CloudWatch Logs is left enabled, all the instances eventually fill up their available disk space with log contents and crash.

This condition has survived multiple instance restarts, waits of multiple hours with the streaming option enabled, the termination and rebuild of every instance in the environment, and deployment of new application versions from ECS.

If I clone stage to a new environment, log streaming works as expected. If I clone prod to a new environment, log streaming fails in exactly the same manner as the original environment. Something is clearly misconfigured for prod but I don't have a clue what it is. What am I missing?

  • Update - We terminated and rebuilt "prod" via a terraform script, and log streaming is working there now. No clue what the original problem was.

    HOWEVER, now what happens is the individual instances slowly fill up their disks because /var/log/eb-docker/containers/eb-current-app/eb-###-stdouterror.log file never seems to be deleted or truncated.

    My understanding is that these instances are supposed to be, by default, configured with log rotation to handle this, but it either doesn't seem to be working correctly, or doesn't run often enough to handle our logging load (which I don't imagine is too aggressive, a few lines of JSON per request coming in over the GraphQL API).

    What am I missing here?

1 Answer
0

You could try checking (or messing around with) the logging agent:

/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a status
answered 2 years ago
  • the agent seems to be running everywhere, I see this output

    [ec2-user]$ /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a status
    {
      "status": "running",
      "starttime": "2022-01-20T17:27:51+0000",
      "version": "1.237768.0"
    }
    

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions