Skip to content

re:Invent 2025 - Accelerate software delivery with Amazon ECS

7 minute read
Content level: Advanced
0

Teams running production services on Amazon ECS face a recurring challenge: deploying new features quickly without affecting existing customers. This session introduces the advanced deployment strategies now available in ECS, including a new lifecycle hooks feature that integrates custom testing and approval logic directly into the deployment process.

The faster a team ships code, the more often it creates opportunities for customers to encounter an unstable revision. At AWS re:Invent 2025, Kevin Gibbs, Principal Software Development Engineer at Amazon Elastic Container Service (Amazon ECS), and Mike Rizzo, Principal Solutions Architect for UK & Ireland at AWS, walked through how Amazon ECS advanced deployments reduce that risk without slowing delivery. In this post, we'll walk through the four ECS deployment strategies, the new lifecycle hooks feature, how these strategies apply across different service exposure patterns, and how to migrate from AWS CodeDeploy.

The four deployment strategies and lifecycle hooks

Every Amazon ECS deployment is built from two components: a task definition, which specifies a single unit of work, and service configuration, which defines how one or more tasks run as a service. Together these produce a service revision, and moving from one revision to the next is a deployment.

Amazon ECS offers four deployment strategies. Rolling is the default: new tasks are created as old ones are stopped, keeping capacity roughly constant. The three advanced strategies (blue/green, canary, and linear) run two full sets of tasks in parallel throughout the deployment, temporarily doubling your task count. Because the previous revision remains fully scaled during this window, reverting to it does not require provisioning new capacity.

The three advanced strategies differ in how they shift traffic. Blue/green moves production traffic from the old set to the new set in a single step, which works well when clients should consistently interact with one revision. Canary shifts a configurable percentage of traffic (as low as 0.1%) to the new set first, waits a configurable bake time, then completes the shift. Linear distributes the shift in equal increments over time, with the minimum increment set at 3%, which is useful for observing service behavior as load gradually moves to the new revision.

This year, Amazon ECS added lifecycle hooks, which let you insert custom logic at key points in the deployment process. Each advanced deployment moves through a defined sequence: pre-scale-up, scale-up, post-scale-up, test traffic shift, post-test traffic shift, production traffic shift, post-production traffic shift, and cleanup. At hookable stages, you attach an AWS Lambda function that runs before the deployment proceeds. The hook returns success (proceed), failure (roll back), or in-progress (Amazon ECS re-invokes the hook on a polling interval).

Hooks can run admission control checks before the new revision scales up, execute automated tests while new tasks receive only test traffic, or gate the production shift on a manual approval parameter. With canary, the production traffic shift hook is invoked twice: once when the initial percentage starts flowing, and again after the canary bake time ends. Mike walked through a pattern where monitoring starts at the first invocation and results are evaluated at the second, letting the hook roll back if performance falls short.

Amazon ECS provides two additional rollback mechanisms alongside hooks. Circuit Breaker checks that new tasks reach a healthy state within a defined window. Amazon CloudWatch alarms let you trigger rollbacks based on metrics specific to your workload, such as HTTP error rates, CPU utilization, or queue depth. When using canary or linear, alarm thresholds should account for the mix of old and new tasks running at the same time.

Advanced deployments across service exposure patterns

ALB-fronted services. When an Application Load Balancer (ALB) handles routing, Amazon ECS manipulates the weights on listener rules to shift traffic between task sets. Advanced deployments with ALB support ALB's request routing capabilities, including path-based, header-based, and host-based routing. This is a distinction from AWS CodeDeploy, which does not support path-based routing with blue/green deployments. Configuring this requires a second target group, a production listener rule ARN, an optional test listener ARN, and an IAM role that grants Amazon ECS permission to modify listener rule weights.

Service Connect. When a service is exposed internally through Service Connect, traffic shifting is handled by the Service Connect proxy running alongside each application container. Both task sets are registered in AWS Cloud Map, with the new revision labeled as the test instance. By default, requests carrying the x-ecs-blue-green-test header are routed to the new revision, and you can substitute a header-matching rule that fits your use case, such as agent string matching or API version numbers. Test traffic must originate from within the Service Connect namespace, so automated test clients need to be deployed there.

Headless services. Some services pull messages from a queue and have no inbound traffic to route. Advanced deployments still apply because you benefit from having a fully scaled previous revision ready to reactivate quickly if the new version does not perform correctly. The recommended pattern deploys the new revision in a deactivated state, controlled by a flag in AWS Systems Manager Parameter Store. A lifecycle hook at the production traffic shift stage deactivates the old revision and activates the new one. CloudWatch alarms on queue depth can detect processing failures and trigger a rollback.

Network Load Balancer. Network Load Balancer (NLB) launched at re:Invent 2025 with blue/green support only, and Amazon ECS extended NLB to canary and linear strategies in February 2026. Teams running TCP/UDP workloads, or services that require low latency and static IP addresses, can now use gradual traffic shifting on NLB alongside ALB users. Because NLB operates at layer 4, path-based routing is not available, so if you configure a test listener it must use a different port than production. Amazon ECS also adds a 10-minute delay to the test traffic shift and production traffic shift stages to account for NLB connection handling.

Choosing a strategy and migrating from CodeDeploy

The choice between the three advanced strategies comes down to what you need during the traffic shift. Blue/green is the right choice when clients should always interact with one consistent revision, for example in a web application that maintains session state. Canary is well-suited for validating new behavior with a limited portion of production traffic before completing the rollout. Linear fits scenarios where you want to observe performance as load gradually shifts to the new revision.

Switching from rolling to an advanced strategy requires adding an advanced configuration block to the load balancer section of your service configuration, which specifies a second target group and the relevant listener rule ARNs. Once in place, you can move between rolling, blue/green, canary, and linear without further structural changes. Amazon ECS requires the advanced configuration to remain in place during rolling deployments because it needs both target groups to track which one is currently serving traffic.

For teams migrating from CodeDeploy, there are two paths. You can update the service in place by changing both the deployment controller and deployment configuration in a single call. Alternatively, you can create a replacement service and cut over when ready, which gives you time to validate lifecycle hook configurations before they handle production traffic. The main reasons to consider this migration are lifecycle hook support, Service Connect compatibility, path-based routing on ALB, and advanced deployment support for headless services.

Amazon ECS advanced deployment strategies give teams a way to increase deployment frequency without accepting longer rollback times. Lifecycle hooks make it possible to integrate existing test and approval workflows directly into the deployment process, so you can validate new code against real infrastructure before any customer is affected. The model of running old and new revisions in parallel with configurable traffic shifting applies consistently across ALB-fronted services, Service Connect, headless queue consumers, and Network Load Balancer configurations.

Watch the full session recording: re:Invent 2025 - Accelerate software delivery with Amazon ECS (CNS315)

AWS
EXPERT
published 2 months ago92 views