re:Invent 2025 - From code to cloud: Accelerate application development with Amazon ECS
Getting a containerized application to production means coordinating networking, load balancers, TLS certificates, autoscaling, and observability configurations. Amazon ECS Express Mode collapses that process to three inputs. This post covers the new feature and how platform teams can build effective internal developer platforms on ECS.
Deploying containerized applications has always required significant AWS expertise. Before a single request can be served, developers must configure VPCs, task definitions, Application Load Balancers (ALBs), target groups, autoscaling policies, and observability pipelines. For platform teams, scaling that process across hundreds of development teams multiplies the challenge further. In this post, we'll explore how Jennifer Lin, Principal Product Manager for Amazon ECS; Tsahi Duek, Principal GTM Solutions Architect for Containers at AWS; and Keith Bartholomew, Principal Software Engineer at GoDaddy, addressed these challenges at re:Invent 2025, covering new ECS capabilities and real-world patterns for building developer platforms at scale.
Amazon ECS: A fully managed, versionless foundation
Amazon ECS is a fully managed container orchestration service with no control plane for you to manage, patch, or upgrade. When you create a cluster, you are creating a logical grouping, not infrastructure that requires ongoing care. This is a meaningful distinction: the ECS control plane is versioned and operated by AWS, which means no upgrade coordination windows, no patching schedules, and no operational overhead at the orchestration layer.
ECS supports multiple compute options. AWS Fargate remains the most widely adopted choice, providing serverless compute where AWS manages the underlying infrastructure completely. In Fargate, each task runs on a dedicated EC2 instance, creating a strong security boundary that is particularly valued in regulated industries. A new addition is ECS Managed Instances, which provides access to a broader range of EC2 instance types (including GPU, network-optimized, and memory-optimized options) while AWS still handles patching, maintenance, and bin-packing. This is a good fit when you need specific instance hardware but want to keep the operational model of Fargate.
On deployment strategies, ECS historically supported rolling deployments, with teams turning to AWS CodeDeploy for blue-green patterns. This year, ECS added native blue-green, canary, and linear deployment strategies directly within the service. The wiring of target groups and load balancer configurations is now handled automatically. ECS also includes Service Connect, a built-in service discovery and mesh capability that requires no installation, no patching, and no separate maintenance.
ECS Express Mode: From container image to live service in three inputs
The centerpiece of this session was Amazon ECS Express Mode, launched the week before re:Invent 2025. The ECS team identified a pattern repeating across customer accounts: the same collection of resources (VPCs, ALBs, task definitions, autoscaling, TLS certificates, and CloudWatch logging) assembled every time a new service was deployed, with most of that configuration happening outside of ECS itself.
Express Mode addresses this directly. To create a service, you provide your container image and two IAM (Identity and Access Management) roles. The task execution role is familiar to ECS users; it allows ECS to pull your image from Amazon ECR and configure logging. The infrastructure role is new and is what Express Mode uses to provision supporting resources on your behalf. From those three inputs, Express Mode creates a highly available, scalable service configured with AWS best practices: canary deployments, alarm-based rollbacks, TLS certificates, autoscaling policies, Amazon CloudWatch logging, Availability Zone rebalancing, and minimally permissive inbound security groups. At the end of the deployment, you have a live application URL.
The CLI experience includes a --monitor-resources flag that provides an interactive view of your service's create, update, and delete lifecycle, showing resource ARNs, statuses, and events in real time. Express Mode manages lifecycle operations that are complex in standard ECS, including updating ports or health checks and moving a service between public and private subnets, handling the coordination that would otherwise require multiple steps across different resources.
On the infrastructure-as-code side, Express Mode is available as a single AWS CloudFormation resource (AWS::ECS::ExpressService) with a small number of required parameters, a significant reduction compared to the full template needed to provision the equivalent architecture manually. It is also available in the AWS CDK as an L1 construct, via Terraform, and through a new GitHub Action that lets you build a container image, push to ECR, and deploy to an Express Mode service directly from a repository workflow.
A key design decision was keeping the underlying resources visible and modifiable. If you add a sidecar container directly to the task definition in the console, Express Mode persists that change when you next update the service through its interface. This is what Tsahi described as "composition within an abstraction": you can operate entirely within the Express Mode interface for everyday tasks, or go directly to the underlying resources when you need to. You do not have to forfeit one to access the other. On cost sharing, Express Mode services deployed to the same subnets in the same account share an ALB, up to 25 services per load balancer. When a 26th service is created, Express Mode provisions a second load balancer automatically. Express Mode itself carries no additional charge; you pay only for the underlying resources such as Fargate compute and ALB usage.
Building internal developer platforms on ECS
For platform teams, Tsahi outlined three design principles that shape effective developer platforms: lifecycle management, economies of scale, and break glass procedures.
Lifecycle management means owning the full deployment lifecycle from creation to decommission, with a clear entry point for developers. Economies of scale means sharing resources such as ECS clusters, Amazon VPC configurations, and ALBs across teams rather than provisioning a dedicated set per application. Break glass procedures are the escape hatches that let developers go beyond what the platform covers without losing the benefits they already have.
Keith Bartholomew described GoDaddy's internal platform, Katana, which serves more than 2,000 engineers across hundreds of AWS accounts. The platform team of seven or eight engineers chose ECS Fargate specifically because managing Kubernetes nodes at that scale was not feasible. Fargate removed the patching and scheduling burden without restricting developer flexibility. Every deployment in Katana creates a new ECS service and target group, leaving the previously running service untouched as a warm standby. Traffic shifts to the new version gradually or immediately, and the prior version remains available as a fallback. Some teams run five or more concurrent versions for live preview environments or dark release testing.
Amazon Route 53 records provide latency-based routing between regions and automatic failover, configured out of the box for teams running in multiple regions. A centralized observability team ensures that every service managed by Katana gets logs, traces, and metrics piped to a shared observability platform with zero configuration required from the developer.
For escape hatches, GoDaddy lets developers supply their own WAF ACLs, security groups, and IAM policies by ARN. When the platform cannot handle a specific requirement, developers take ownership of that resource and the rest of the platform stays intact. CloudFormation hooks enforce governance guardrails automatically, so developers cannot create ECS services with public IPs, and the platform itself operates within those same constraints. As Bartholomew noted, Katana does nothing that developers could not do themselves. It accelerates the path; it does not restrict it. GoDaddy also built an AI agent powered by an Amazon Bedrock knowledge base that reads deployment events and CloudFormation state in real time and provides actionable, context-specific guidance when a deployment fails, rather than surfacing raw ECS error messages.
The session closed with a point on platform design philosophy: engineers at a company exist across a wide spectrum of AWS expertise, from deep infrastructure specialists to developers who just need their service running. A platform that only serves one end of that spectrum leaves significant value on the table. Designing for the full range, so that experts can go deep and newcomers can get started quickly, is what Katana set out to do and what Express Mode embodies at the service level.
Watch the full session: re:Invent 2025 CNS341 - From code to cloud: Accelerate application development with Amazon ECS
- Language
- English
Relevant content
- asked 3 years ago
