re:Invent 2025 - From Trigger to Execution: The Journey of Events in AWS Lambda
Every Lambda invocation starts with a trigger, but what happens between that trigger and your code running is far more complex than most developers realize. This session pulls back the curtain on Lambda's architecture, covering how events travel through the system, how the event source mapper works at scale, and what queuing theory has to do with a serverless compute service.
When you invoke an AWS Lambda function, you're interacting with a system that processes more than 15 trillion requests each month, including 1.7 trillion invocations on Prime Day alone. Most of that complexity is hidden by design. Julian Wood, Principal Serverless Developer Advocate at AWS, and Rajesh Kumar Pandey, Principal Software Engineer at AWS Lambda, delivered session CNS423 to explain exactly what happens between a trigger and execution. In this post, we'll walk through the three invoke types, how Lambda's event source mapper handles stream and queue integrations, and the operational lessons the Lambda team has drawn from queuing theory that keep the service stable under pressure.
The Three Paths an Event Can Take
Lambda exposes three invoke models, and each follows a distinct path through the system.
Synchronous invocations, whether you call Lambda directly or through Amazon API Gateway, send the request through the Lambda API frontend, which is a multi-availability-zone (AZ) load balancer. From there, the invoke reaches the sync invoke service (internally called the Frontend Invoke Service), which authenticates the request, checks quota limits through a counting service with under 1.5-millisecond latency requirements, and contacts the assignment service to find or create an execution environment on a worker host. If this is the first call, the placement service spins up a new execution environment, loads your code or container image, and runs the initialization process. Subsequent calls skip that cold path entirely and route the payload directly to the already-warm execution environment.
Asynchronous invocations take a different route. When you trigger a function asynchronously (through an Amazon Simple Storage Service (Amazon S3) notification, an Amazon EventBridge rule, or a direct async API call), Lambda accepts the event, writes it to an internal Amazon SQS queue, and immediately returns a 202 acknowledgment. A separate fleet of poller instances then reads from those internal queues and submits the events to the sync invoke service. This separation is intentional. Lambda split the async and sync data planes after a large async spike overwhelmed the sync service, and the architectural separation now protects synchronous callers from async traffic floods.
Event source mapping (ESM) is the third model and handles integrations with queues and streams. A producer writes messages to a source like Amazon Kinesis, Amazon DynamoDB Streams, Apache Kafka, or SQS, and Lambda manages a fleet of pollers that read those messages and invoke your function synchronously. All three models converge at the same sync invoke path. Every Lambda invocation is ultimately a synchronous call.
Lambda as a Queuing Service
Rajesh Pandey opened the architectural deep dive with a framing that surprises many developers: Lambda is also a queuing service, and the team has applied queuing theory systematically to its design.
Queuing theory models how work arrives, waits, and gets processed. The core insight is that arrival rates and service rates are both variable, and variance in either direction causes backlogs to form. The Lambda team applied four lessons from this model.
First, buffers smooth variance. Lambda's async invocation path uses an ingestion tier that accepts highly variable multi-tenant traffic, persists messages durably, and normalizes them before processing. When a customer's traffic spikes dramatically, the buffer absorbs the burst without the downstream processing layer needing to scale instantaneously. Lambda also uses shuffle sharding on its internal queues: instead of hashing a customer to a single queue partition (which risks hot partitions when customers overlap), each customer hashes to multiple queue candidates and Lambda writes to the one with the lowest depth at that moment. High-traffic customers also get dedicated "express lane" queues that keep their traffic from affecting shared queue partitions.
Second, workers should match their workload type. Polling and execution are fundamentally different in character. Polling is continuous and stateful, maintaining long-lived connections to event sources. Execution is bursty, stateless, and short-lived. Lambda separates these into distinct worker fleets with different scaling characteristics. The team then took this further, slicing poller workers horizontally into three independently scalable security zones: one responsible for connecting to the event source, one handling internal Lambda service communication, and one handling function invocations. This decomposition means a Kafka consumer rebalancing event, for example, doesn't affect throughput on the invocation side.
Third, variance must be controlled to prevent instability. Lambda builds multiple layers of defense: batching at the ingestion tier to reduce per-message overhead, concurrency caps to prevent a single function from consuming the entire poller fleet, throttling when pollers are saturated, and back-pressure mechanisms that slow polling rather than overwhelm downstream services.
Fourth, distributed systems need explicitly built global state. Queuing theory assumes a scheduler can see all workers and all work. In a distributed system, that requires construction. Lambda's assignment manager maintains this global view, assigning pollers to specific event sources, detecting when a poller holds a lease but isn't making progress, and reassigning that work to a healthy poller. This coordination layer is what allows the service to continue functioning when individual components fail.
Inside the Event Source Mapper
The ESM is what most developers interact with indirectly when they configure a trigger on a Kinesis stream or an SQS queue. The session covered what happens inside it in useful detail.
When a poller picks up work, it configures the appropriate connector for the event source (KCL for Kinesis, a Kafka consumer for MSK or self-managed Kafka, an SQS client for queues) and begins pulling records into an in-memory buffer. This buffer is where filtering and batching happen before any Lambda invocation is made. Filtering lets you define criteria using EventBridge filter syntax to drop records your function doesn't need to process. Batching groups records into a single payload (up to 10,000 records or 6 MB) to reduce per-invocation overhead. The batch window setting gives the poller time to accumulate records when traffic is low, improving efficiency without increasing latency during high-traffic periods.
Streams (Kinesis, DynamoDB Streams, Kafka) and queues (SQS) have different scaling semantics that reflect different goals. For streams, the challenge is keeping up with the ingestion rate. Lambda scales by adding poller instances to match new shards or partitions, and the parallelization factor setting for Kinesis lets you run up to ten Lambda invocations per shard simultaneously while preserving per-shard ordering. For queues, the challenge is not overwhelming downstream services. SQS acts as a shock absorber, and the max concurrency setting on the ESM gives you flow control over how fast Lambda drains the queue. You should use max concurrency (not reserved concurrency) to manage drain rate, and if you use both together, set reserved concurrency higher to avoid throttling.
Error handling also differs between the two source types. For streams, a failed record blocks further processing of that partition to preserve order. Lambda's bisect-on-error feature handles this by splitting the batch in half, processing each half independently, and isolating the single failing record without discarding the entire batch. For queues, failures don't block other messages. Partial batch response lets your function explicitly report which records failed so that Lambda only retries those, rather than reprocessing the entire batch. Both mechanisms work with the on-failure destination configuration, which captures invocation errors, while a dead-letter queue on the source captures records that fail repeatedly during polling itself.
Provisioned mode for SQS (and recently for Kafka) addresses high-throughput scenarios directly. You configure minimum and maximum poller counts, and Lambda pre-warms that capacity rather than scaling reactively. This can increase SQS throughput up to 16 times compared to standard polling (up to 20,000 concurrent executions versus 1,250). Provisioned mode also supports schema registry integration and binary formats like Avro and Protobuf.
Conclusion
The session covered a substantial amount of architectural ground, but the practical takeaway is clear: the complexity inside Lambda exists so that you don't have to build it yourself. Retry logic, connection pooling, message ordering, back-pressure, lease management, AZ failover, and backlog recovery are all handled at the service level. Your responsibility is configuration (batch size, concurrency limits, error destinations, filtering rules), not implementation.
If you work with event-driven architectures on AWS, understanding the distinction between streams and queues in the ESM is the most immediately useful piece of knowledge from this session. The flow-control semantics are different by design, and choosing the right settings depends on whether your goal is maximum throughput or downstream protection. The session slides and session recording give you everything you need to go deeper.
Watch the full session: CNS423 - From Trigger to Execution: The Journey of Events in AWS Lambda
- Language
- English
Relevant content
- Accepted Answerasked 6 years ago
- Accepted Answerasked 3 years ago
