Skip to content

re:Invent 2025 - Scaling Serverless with platform engineering: A blueprint for success

6 minute read
Content level: Advanced
0

As engineering organizations scale serverless adoption, platform teams face a core tension: giving developers the speed and autonomy they need while keeping security, compliance, and cost under control. This session presents a concrete blueprint, with a real-world example that reduced new service creation time from five months to three hours.

Growing serverless adoption from one team to one hundred teams is not an infrastructure problem. It's an organizational engineering problem. In this post, we'll explore the platform engineering blueprint presented by Anton Aleksandrov, Principal Solutions Architect for Serverless at AWS, and Ran Isenberg, Principal Software Architect at CyberArk and AWS Serverless Hero, including how CyberArk applied these practices to reduce new service launch time by 99%.

From infrastructure bottleneck to platform engineering

Every engineering organization eventually runs into the same problem. A central infrastructure or DevOps team manages cloud resources. Development teams multiply, each opening tickets for infrastructure changes. The infrastructure team, without intending to, becomes the bottleneck that slows down every team depending on it.

Shift-left is the common response: push more infrastructure responsibility to the development teams. The problem is that shift-left without governance creates what Anton describes as "wild west" conditions. If 100 teams each choose their own CI/CD tooling, their own log retention settings, and their own IAM patterns, collecting evidence for a PCI or HIPAA audit can take seven to twelve months. Freedom without structure is expensive.

Platform engineering offers a middle path. Development teams own their full stack, but they build on vetted, standardized artifacts produced by the platform team. The platform team stops being a ticket queue and starts functioning as an internal product team, building the foundations that everyone else builds on top of.

In a serverless world, this tension has a specific shape. An AWS Lambda function contains business logic that belongs to the development team, but it also carries infrastructure concerns (runtime, memory, logging, IAM permissions) that the platform team cares about. A well-designed module catalog resolves this without forcing either team to surrender control.

Building vetted blueprints and guardrails

The most tangible output of a platform engineering team is a catalog of vetted infrastructure as code (IaC) modules. In AWS CDK, these are constructs. In Terraform, they're modules. The underlying principle is the same: encode best practices into reusable building blocks and make them available across the organization.

A baseline Lambda module might do several things developers rarely think about. It can default to a specific runtime, set a 14-day Amazon CloudWatch log retention policy (preventing indefinite log storage costs), and embed a standard version of Lambda Powertools for structured logging and tracing. None of these decisions need to be re-litigated by each development team because the platform team made them once. Developers who need different settings can override the defaults, but the safe configuration ships automatically.

The same approach extends to Amazon DynamoDB tables. A module can enforce server-side encryption, enable point-in-time recovery, and default to least-privilege IAM permissions rather than a wildcard. For Amazon Bedrock integrations, a module can standardize the model selection, temperature, and max_tokens values so that development teams don't need to become language model experts before building their first AI feature.

Modules become more powerful through composability. An Amazon EventBridge scheduled trigger, an Amazon SQS queue with dead-letter queue (DLQ) redrive, and a Lambda processor are three separate modules. Wire them together and you have a reusable analytics pipeline blueprint. The connections between components matter as much as the components themselves, because in serverless architectures those event source mappings are also defined in code. Blueprints can also reach into function code to standardize cross-cutting concerns like observability instrumentation, configuration management, and tenant isolation so developers inherit them without re-implementing them for each new function.

Governance tools make this system reliable by catching problems at two stages. Proactive controls validate infrastructure code during local development and in the CI/CD pipeline before changes are committed. For Terraform, Checkov checks compliance against organizational rules; for CDK, AWS CloudFormation Guard and the open-source cdk-nag library serve the same role. Detective controls catch regressions at runtime. Together, they give developers real flexibility while keeping changes within boundaries the platform team has already validated. For example, a rule might cap Lambda memory for Node.js functions at 2 GB, because at 10 GB Lambda allocates six virtual CPUs that single-threaded JavaScript won't use without deliberate coding. Teams with a genuine reason to exceed the limit can request an exception; everyone else gets a safe default automatically.

CyberArk's results: five months to three hours

Ran Isenberg described CyberArk's implementation in concrete terms. CyberArk is an identity and access management company with over 1,000 developers. In 2020, they started a platform engineering team of 15 engineers with the goal of unifying a fragmented developer experience across multiple SaaS solutions. That team has grown to over 100 engineers across two divisions.

Before the platform engineering investment, a senior engineer needed approximately five months to build a production-ready service with the required tooling: CDK infrastructure code, CI/CD pipelines, Lambda best practices, hexagonal architecture scaffolding, tenant isolation libraries, unit and integration tests, observability instrumentation, and frontend setup. Now it takes three hours.

A developer fills out a form in Port, CyberArk's developer portal, specifying a service name, service ID in the SaaS control plane, and GitHub organization. Three hours later, six scaffolded GitHub repositories are deployed across four AWS accounts (dev, test, staging, and integration across two regions). The output includes a backend CRUD API built on Amazon API Gateway, Lambda functions, and DynamoDB tables; a React frontend served from Amazon S3 via CloudFront; feature flag configuration through AWS AppConfig; and SaaS control plane integrations including subdomain routing, certificate provisioning, and cross-account IAM role delegation. Every repository ships with security best practices, observability libraries, and pipeline configurations for each environment.

As AI tooling became a common requirement, CyberArk added an MCP (Model Context Protocol) server blueprint to their catalog. Before the blueprint existed, three separate development teams were independently figuring out authentication, CDK infrastructure, CI/CD pipelines, testing patterns, and observability for their MCP server implementations. Duplicated effort and architectural inconsistencies were the predictable result. The blueprint, built on Lambda Web Adapter running a FastMCP server, gives each team a deployable starting point with those concerns already handled.

The lessons from CyberArk's journey are consistent with Anton's guidance: start with the pattern used by the most teams in your organization, standardize it, and measure the time saved before expanding. Treat the blueprint catalog as a product with an ongoing roadmap, not a one-time project. Documentation and education matter as much as the code itself; if engineers spend more time understanding a blueprint than building the equivalent from scratch, adoption will stall. Build early blueprints with teams who will give direct feedback, because what platform engineers assume developers need and what developers actually need are often different.

Watch the complete session on YouTube: https://youtu.be/GjWaLj4Y86U.

AWS
EXPERT
published 2 months ago102 views