By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon Elastic Container Service

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Automatically stop CodeDeploy ECS Blue/Green deployment on unhealthy containers

We are writing a CI/CD setup where we remotely trigger a CodePipeline pipeline which fetches its task definition and appspec.yaml from S3 and includes a CodeDeploy ECS Blue/Green step for updating an ECS service. Images are pushed to ECR also remotely. This setup works and if the to-be-deployed application is not faulty and well configured the deployment succeeds in under 5 minutes. However, if the application does not pass health checks, or the task definition is broken, CodeDeploy will continuously re-deploy this revision during its "Install" step without end, creating tens of stopped tasks in the ECS Service. According to some this should time out after an hour, however we have not tested this. What we would like to achieve is automatic stops and rollbacks of these failing deployments. Ideally CodeDeploy should try only once to deploy the application and if that fails, immediately cancel the deployment and thus the pipeline run. According to the AWS documentation no options for this exist in CodeDeploy or the appspec.yaml that we upload to S3, so we are unsure of how to configure this if it is at all possible. We had two wanted scenarios in mind: 1. After one health check failure, the deployment stops and rolls back; 2. The deployment times out after a period shorter than one hour; ideally < 10 minutes. We currently have no alarms attached to the CodeDeploy deployment group, but it was my understanding that these alarms only trigger before the installation step to verify that the deployment can proceed instead of running alongside the deployment. In short; how would we configure either of those scenarios or at least prevent CodeDeploy from endlessly deploying replacement task sets?
0
answers
0
votes
7
views
asked a day ago

ECS agent sporadically times out while fetching secrets from SSM Parameter Store

We have an ECS cluster in us-west-2 that runs a few ECS services. We run some ECS tasks that are invoked periodically via EventBridge. All tasks use the EC2 launch type and run on container instances that we manage with an Auto Scaling Group. AMI used currently is amzn2-ami-ecs-hvm-2.0.20220630-x86_64-ebs. Container instances are launched in private subnets and VPC endpoints are set up for a few AWS services, including SSM. A few months ago we started seeing missed checkins from the periodically launched tasks and saw that at least some of them failed to launch due to a timeout from the SSM API endpoint. In ecs-agent's log, it shows up like: > level=error time=2022-09-19T22:30:56Z msg="Failed to create task resource" error="fetching secret data from SSM Parameter Store in us-west-2: RequestError: send request failed\ncaused by: Post \"https://ssm.us-west-2.amazonaws.com/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" task="..." resource="ssmsecret" > level=info time=2022-09-19T22:30:56Z msg="Setting terminal reason for task" reason="fetching secret data from SSM Parameter Store in us-west-2: Request Error: send request failed\ncaused by: Post \"https://ssm.us-west-2.amazonaws.com/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" task="..." We tried increasing the throughput of SSM Parameter Store through its settings, but it didn't seem to have an effect. https://docs.aws.amazon.com/systems-manager/latest/userguide/parameter-store-throughput.html Other guides and Q&As I could find were about network misconfigurations that would lead to a complete inability to talk to SSM, whereas the symptom I'm seeing is only intermittent; the ECS tasks get launched without an issue most of the time. https://aws.amazon.com/premiumsupport/knowledge-center/ssm-tcp-timeout-error/ What could be the cause? What else can I look into?
0
answers
0
votes
18
views
asked 10 days ago