ECS agent sporadically times out while fetching secrets from SSM Parameter Store

0

We have an ECS cluster in us-west-2 that runs a few ECS services. We run some ECS tasks that are invoked periodically via EventBridge. All tasks use the EC2 launch type and run on container instances that we manage with an Auto Scaling Group. AMI used currently is amzn2-ami-ecs-hvm-2.0.20220630-x86_64-ebs. Container instances are launched in private subnets and VPC endpoints are set up for a few AWS services, including SSM.

A few months ago we started seeing missed checkins from the periodically launched tasks and saw that at least some of them failed to launch due to a timeout from the SSM API endpoint.

In ecs-agent's log, it shows up like:

level=error time=2022-09-19T22:30:56Z msg="Failed to create task resource" error="fetching secret data from SSM Parameter Store in us-west-2: RequestError: send request failed\ncaused by: Post "https://ssm.us-west-2.amazonaws.com/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" task="..." resource="ssmsecret" level=info time=2022-09-19T22:30:56Z msg="Setting terminal reason for task" reason="fetching secret data from SSM Parameter Store in us-west-2: Request Error: send request failed\ncaused by: Post "https://ssm.us-west-2.amazonaws.com/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" task="..."

We tried increasing the throughput of SSM Parameter Store through its settings, but it didn't seem to have an effect. https://docs.aws.amazon.com/systems-manager/latest/userguide/parameter-store-throughput.html

Other guides and Q&As I could find were about network misconfigurations that would lead to a complete inability to talk to SSM, whereas the symptom I'm seeing is only intermittent; the ECS tasks get launched without an issue most of the time. https://aws.amazon.com/premiumsupport/knowledge-center/ssm-tcp-timeout-error/

What could be the cause? What else can I look into?

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions