Skip to content

re:Invent 2025 - Supercharge Serverless testing: Accelerate development with Kiro

7 minute read
Content level: Advanced
0

Serverless applications are highly distributed and touch many cloud-native services, which makes testing more complex than it appears. This session shows how to structure your Lambda code for testability, apply in-memory fakes with dependency injection, use property-based testing for business logic, and integrate agentic AI throughout your development lifecycle.

Serverless testing is harder than it looks, and the difficulty usually traces back to one root cause: code that mixes HTTP handling, business logic, and AWS service calls in the same place. Arthi Jaganathan, Principal Specialist Solutions Architect for Application Modernization at AWS, and Tomas Mihalyi, Senior Specialist Solutions Architect for Amazon Q at AWS, walked through how to fix this problem at a code level and how to use Kiro, an agentic coding tool from AWS, to assist with the refactoring and test generation. In this post, we'll walk through how to design serverless code that is easier to test, how to write unit and integration tests that remain stable as your application evolves, and how to apply agentic AI across the full development lifecycle.

The session used a Python task management API built on Amazon API Gateway, AWS Lambda, and Amazon DynamoDB, with an asynchronous path publishing events to Amazon EventBridge. The original code used the Moto library to mock boto3 calls in unit tests. While this approach worked, it tightly coupled the tests to specific AWS service choices. Changing DynamoDB to another database, for example, required rewriting tests that validated handler logic, not database behavior.

Designing Serverless Code for Testability

Tomas used Kiro's custom agent capability to run an automated architectural audit of the codebase. Kiro read the entire project, referenced configured MCP (Model Context Protocol) servers for additional context, and produced a hexagonal architecture audit report identifying where coupling was creating testing friction. He then used Kiro's spec-driven development flow to generate a structured migration plan with requirements, a design document including architecture diagrams, and a task list for the refactoring. From there, Kiro worked through the tasks to reshape the application.

The resulting architecture separated the codebase into three distinct layers. The Lambda handler managed only HTTP request parsing and response formatting. A domain layer held the business rules, including a circular dependency check for tasks. An integration layer contained the actual calls to DynamoDB and EventBridge, implementing the interfaces the domain layer defined. The domain layer had no knowledge of which AWS services backed it. It depended on a TaskRepositoryProtocol and an event publisher protocol, both of which defined method contracts without any implementation.

This decoupling has a direct payoff for testing. If you replace DynamoDB with a different database, you rewrite the integration layer. The handler tests and domain tests remain unchanged, because the code they validate did not change.

Arthi also highlighted one small addition that made dependency injection straightforward without requiring a framework. The application stored the active task service in a module-level variable. At runtime, if the variable was unset, the real service initialized with production dependencies. In tests, a pytest fixture set that variable to an in-memory fake before each test and cleared it in teardown. No runtime patching was needed.

Writing Tests That Scale With Your Application

With the architecture decoupled, the session turned to three specific testing patterns that take advantage of that structure.

For handler unit tests, the team replaced Moto with an in-memory fake task service. The fake implemented the same protocol as the real service and accepted constructor flags to control its behavior. Passing raise_conflict_error=True caused the fake's create_task method to raise the expected exception. Passing no flags let it follow the success path. Because all the configuration complexity lived in the fake class, individual test cases became short and readable. A test for the circular dependency error scenario, which previously required creating a dependency graph in a mock DynamoDB table, now reduced to instantiating the fake with the appropriate flag and invoking the handler. Arthi noted that standard mocks remain appropriate for third-party dependencies like email services, where you only need to verify a method was called rather than simulate complex behavior.

For pure domain logic like the circular dependency algorithm, the session introduced property-based testing using the Hypothesis library. Rather than writing individual test cases with handpicked inputs, you define the shape of valid inputs and Hypothesis generates hundreds of combinations automatically. Arthi defined two lists of UUIDs representing a main task chain and a branching chain, with constraints to keep them non-overlapping and within size bounds. Hypothesis then executed 100 test cases, each with a unique generated graph, verifying that introducing a cycle was detected every time. The overhead per test run was approximately 300 to 400 milliseconds total, and failures surface the exact inputs that caused them. Property-based tests work well for pure functions, business rule validation, and serialization logic. For end-to-end workflows that require specific inputs to trigger particular paths, they are less applicable.

For integration tests against DynamoDB and EventBridge, the recommendation was to test against the real AWS services. Mocking libraries may not fully support every service feature, and only real service calls validate IAM permissions and networking configuration. The EventBridge integration required a small test harness: a separate Lambda function subscribed to test events, which persisted received events to a second DynamoDB table. After publishing an event, the test waited briefly, then queried the table to confirm delivery. The session also covered schema validation using Pydantic models mirroring the expected event structure. Calling model_validate on a published event raised a validation error on any breaking schema change, catching contract violations before they reached downstream subscribers.

Arthi emphasized that failure scenarios matter as much as the success path. In-memory fakes simulated DynamoDB errors like optimistic locking conflicts, and tests confirmed the handler returned the correct HTTP status code with a message that made sense to the client. One specific example: IAM permission errors from DynamoDB should not surface as permission-related messages to callers. A test verified that the handler translated those errors into a generic 500 response.

Conclusion

The patterns from this session address the most common friction points in serverless testing in a deliberate sequence. Decoupling your handler, domain, and integration layers removes the dependency between infrastructure choices and the tests that validate your logic. In-memory fakes with dependency injection give you full control over test behavior without runtime patching. Property-based testing catches edge cases in business logic that handpicked test cases miss. Testing against real AWS services catches the configuration and permission gaps that mocks cannot surface.

Kiro contributed throughout the process, from generating the architectural audit and migration spec to authoring the refactored code and tests. The session closed with the AI-Driven Development Lifecycle (AI-DLC) framework, which structures agentic AI use across inception, construction, and operation phases. After a period in production, Kiro analyzed a collected bug report, produced a risk heat map identifying the highest-pain components, and fed those findings back into a new spec for the next iteration. The result is a feedback loop where operational data continuously informs the next round of improvements.

The complete Task API with the patterns described in this session is available on GitHub via the resources shared at the session.

Session recording: CNS427 - Supercharge Serverless testing: Accelerate development with Kiro