Skip to content

re:Invent 2025 - Secure Multi-tenant SaaS with AWS Lambda: A Tenant Isolation Deep Dive

7 minute read
Content level: Advanced
0

Multi-tenant SaaS applications on serverless share compute by design. That efficiency comes with a specific concern: state from one tenant's invocation can persist into the next tenant's execution context. This session introduces Lambda's new Tenant Isolation Mode, which provides vendor-managed, per-tenant compute boundaries within a single function, and walks through API Gateway integration and per-tenant observability.

Building a multi-tenant SaaS product on AWS Lambda means accepting that multiple customers share the same function code, and often, the same execution environments. That efficiency is central to serverless economics, but it introduces a real concern: without deliberate handling, state left behind by one tenant's invocation can surface in the next tenant's request. Anton Aleksandrov, Principal Solutions Architect for Serverless at AWS, and Bill Tarr, Principal Partner Solutions Architect at AWS, walked through this problem in depth at re:Invent 2025 session CNS381. In this post, we'll cover how Lambda execution environments behave in a multi-tenant context, what the new Tenant Isolation Mode does, how to integrate it with Amazon API Gateway, and how to set up per-tenant observability.

The shared compute problem in multi-tenant Lambda

Lambda manages concurrency by reusing execution environments across invocations. When a request arrives and an idle environment is available, Lambda routes it there rather than spinning up a new one. This is by design and is what keeps the cost-to-utilization ratio tight. In a single-tenant function, this behavior is straightforward. In a multi-tenant function, the picture is more complex.

Execution environments are function-level constructs, not tenant-level ones. Memory, disk, environment variables, and cached data persist within an environment across invocations. If your function writes tenant-specific data during one invocation (cached database connection strings, session state, configuration values) and a subsequent invocation from a different tenant lands in that same environment, that data is still there. Whether the second tenant ever reads it depends on how carefully the code handles cleanup, but the exposure exists at the infrastructure level.

This is not a Lambda-specific concern. Containerized workloads and EC2 instances carry the same risk when shared across tenants. Lambda's short execution environment lifecycle reduces the exposure window, but does not eliminate the concern. The two traditional mitigations each carry trade-offs. A function-per-tenant model provides strong compute isolation but creates significant operational overhead: separate CI/CD pipelines, Identity and Access Management (IAM) configurations, and deployment stacks for potentially tens of thousands of tenants. A shared function with a custom tenant isolation framework is operationally simpler, but places the burden of data cleanup entirely on your development team. As Anton noted in the session, documentation and best practices are not a substitute for a platform-enforced boundary.

Lambda Tenant Isolation Mode

AWS launched Tenant Isolation Mode for Lambda to shift compute isolation responsibility to the service itself. When you create a function with TenantIsolationMode: PER_TENANT, Lambda maintains separate execution environments for each unique tenant identifier you supply at invocation time. Requests from one tenant never land in an environment that served a different tenant.

aws lambda create-function \
  --tenancy-config '{"TenantIsolationMode":"PER_TENANT"}' \
  ...

The function code, IAM execution role, and deployment package remain shared across tenants. You manage one function, not thousands. The execution contexts, however, are strictly separated. Tenant-specific state written to memory or disk stays within that tenant's environments for the lifetime of those environments.

This is a function-creation-time decision. You cannot change the isolation mode on an existing function, which is an intentional restriction to avoid ambiguous security states mid-lifecycle. At invocation time, you pass a TenantId parameter (an alphanumeric string up to 128 characters). Lambda propagates this value into the function's context object, so your handler code has access to the tenant identity at runtime for branching logic or tenant-specific operations. No pre-registration is required, and there is no quota on the number of supported tenants.

There are tradeoffs worth planning around. Cold starts are now per-tenant. Because execution environments are not shared across tenant boundaries, a tenant with infrequent invocations will likely see a cold start on most requests. Standard Lambda concurrency quotas still apply, and because the service now maintains more total execution environments across a tenant population, reviewing your account-level concurrency limits before a high-tenant-count deployment is prudent. Provisioned Concurrency is not supported with this mode, and at launch, Tenant Isolation Mode is available for direct invocations and API Gateway integrations.

API Gateway integration, noisy neighbor protection, and per-tenant observability

The most common path to a Lambda function in a SaaS product runs through API Gateway. Wiring Tenant Isolation Mode into that path requires mapping a tenant identifier from the inbound request to the X-Amz-Tenant-Id HTTP header that Lambda expects on the integration request.

The source of that identifier is flexible. It can come from a custom HTTP header, a query parameter, a path parameter, a Lambda Authorizer response, a JWT (JSON Web Token) claim, or a subdomain prefix. The session demonstrated a JWT-based approach where the authorizer validates the token, extracts the tenant identifier from its claims, and returns it in the context. A single line of API Gateway integration configuration maps that context value to the required header:

"integration.request.header.X-Amz-Tenant-Id": "context.authorizer.tenantId"

For noisy neighbor protection, API Gateway usage plans provide per-tenant rate limiting before requests reach Lambda. You define plans by tier (for example, a standard plan at 10 requests per second and a premium plan at 30 requests per second), associate tenants with a plan using the same tenant identifier, and rate limiting is enforced at the edge without requiring logic inside the function.

The session also covered tenant-scoped IAM credentials as a complement to compute isolation. Using a Lambda Authorizer, you can call AWS Security Token Service (STS) to retrieve short-lived, tenant-scoped credentials after validating the inbound JWT. Those credentials travel through the API Gateway context to the Lambda function alongside the standard function execution role. The execution role handles permissions that apply broadly across the function (shared storage access, for example), while the tenant-scoped credentials restrict downstream resource access to that specific tenant's data. A function operating on behalf of the blue tenant can read from the blue tenant's Amazon S3 bucket but not the green or yellow tenant's buckets, because the tenant-scoped credentials do not grant those permissions.

Per-tenant observability follows from the isolation architecture. When JSON-format logging is enabled on the function, Lambda injects the tenant ID into each log entry automatically. No manual instrumentation is needed to tag log lines with a tenant identifier. Because execution environments are tenant-specific, each Amazon CloudWatch log stream belongs to a single tenant. You can filter Live Tail output by tenant ID in real time, or use CloudWatch Logs Insights to query log streams and messages scoped to a specific tenant across a defined time window. For custom metrics, AWS Lambda Powertools supports adding a tenant dimension to metric emissions, enabling per-tenant operational data alongside function-level metrics.

What this changes for SaaS builders

Tenant Isolation Mode does not replace every isolation strategy. If your customers have contractual or regulatory requirements for account-level separation, a dedicated-account model remains the right architecture. But for multi-tenant SaaS builders who want the cost and operational benefits of a pooled function without taking on compute-level isolation in application code, this capability closes a meaningful gap.

The shift is one of responsibility. Previously, preventing cross-tenant state leakage required discipline at the code level. A missed cleanup step, a new engineer unfamiliar with the isolation framework, or a rushed feature could create an exposure. Now, the execution boundary is enforced by the service. Your code still benefits from being stateless and clean, but a lapse in discipline is contained within a tenant's own environments rather than crossing tenant lines.

For sample code and additional resources from this session, see Serverless Land, where the team publishes patterns, reference architectures, and weekly office hours covering new Lambda capabilities in depth.


Watch the full session: Secure Multi-tenant SaaS with AWS Lambda: A Tenant Isolation Deep Dive (CNS381)