- Newest
- Most votes
- Most comments
I am witnessing the same behavior of the service connect sidecar.
It uses most of the memory of a 2 GB ECS fargate service:
I raised a AWS support ticket and it is currently in internal investigation with the dev team.
In general I think it is reasonable to define resource limits on the service-connect container.
Thanks for sharing your experience.
It appears the issue was caused by something internal to AWS ECS Service Connect.
Even after rebuilding everything from scratch, the same problem persisted. However, when we ran the exact same application without Service Connect, we confirmed that memory usage no longer increased.
So here’s how I’m working around it: • Don’t use Service Connect. • Place an internal NLB on the private network and switch inter-service communication to go through it. (You will need to distinguish services by port number.)
In my view, NLBs offer much better performance and shouldn’t trigger this kind of latent issue, so they’re a solid choice.
Also, if you run multiple tasks, each task spins up its own Service Connect sidecar, which I consider a waste of resources. With a single NLB, you avoid that overhead, and it’s much more reliable for scaling out.
AWS Support applied a fix for this and asked us to do a force deployment for it to take effect. We've confirmed that this addressed the issue.
Based on your description, you're experiencing a memory leak in the Service Connect Proxy sidecar container after a routine redeploy, which eventually leads to service outages when the proxy hits memory limits.
Here are some approaches to troubleshoot and potentially resolve this issue:
-
Monitor Service Connect Proxy Metrics You can set up specific monitoring for the Service Connect Proxy sidecar container. AWS provides metrics specifically for Service Connect that can help identify issues with the proxy. This would give you visibility into the proxy's behavior and help confirm your suspicion about memory leaks.
-
Vulnerability Scanning Consider activating AWS Inspector on your container images, particularly the Service Connect sidecar. Some vulnerabilities can manifest as memory leaks, and identifying and addressing these could potentially resolve the issue.
-
Health Checks and Deployment Configuration Implement health checks for your tasks to better monitor the health of the Service Connect container. This can help ECS determine when containers are becoming unhealthy and replace them before they cause outages.
-
Task Resource Allocation Review your task definition resources. You might need to increase the memory allocation for your tasks to accommodate the Service Connect proxy's behavior until a more permanent solution is found.
-
Deployment Strategies Consider implementing different deployment strategies, such as rolling updates, to minimize the impact of any issues that arise with the Service Connect proxy.
-
Verify Proxy Configuration Ensure your Service Connect configuration is correct. You can verify that your connection is using the Service Connect proxy by checking for the "server: envoy" header in responses.
-
Restart Strategy As a temporary mitigation, you might consider implementing a scheduled task restart before memory usage reaches critical levels, though this is more of a workaround than a solution.
Since this issue appeared after a routine redeploy without code changes, it's possible that there was an underlying change in the Service Connect proxy version or configuration. If the issue persists after trying these approaches, reaching out to AWS Support might be necessary as they can provide more specific guidance based on your exact setup and the version of the Service Connect proxy you're using.
Sources
Monitoring Service Connect Proxy in ECS Fargate for Potential Memory Issues | AWS re:Post
ECS connect container unhealthy during new deployments to ECS EC2 | AWS re:Post
Troubleshoot service connect issues in Amazon ECS | AWS re:Post
Relevant content
- asked 3 years ago
- AWS OFFICIALUpdated a year ago

I've observed this issue as well and started on 8/27. I've raised a support ticket with AWS.