Skip to content

How AWS Fault Injection Service (FIS) Complements AWS Resilience Hub

5 minute read
Content level: Intermediate
0

This article explores how AWS Fault Injection Service (FIS) complements AWS Resilience Hub to help teams move from reactive incident response to proactive resilience planning.

What is AWS Fault Injection Service?

AWS Fault Injection Service (FIS) is a fully managed service for running fault injection experiments to improve an application's performance, observability, and resilience. It simplifies the process of setting up and running controlled fault injection experiments across a range of AWS services, so teams can build confidence in their application behavior.

Fault Injection Testing vs. Resilience Testing

Fault injection testing intentionally introduces specific, targeted errors (e.g., latency, dropped packets) to verify if individual components behave as expected in a controlled environment.

Resilience testing is the broader strategy focused on validating a system's ability to maintain functionality, recover from disruptions, and ensure overall stability in production.

FIS provides an out-of-the-box Scenario Library to inject failures into your application.

FIS Scenario Library

Image – 1 Out of the box Scenario library experiments

How Resilience Hub Integrates with FIS

AWS Resilience Hub's integration with AWS Fault Injection Service enables continuous resilience validation through controlled chaos engineering practices. This integration provides:

  • Proactive Issue Detection — Simulate failures in a controlled manner to identify weaknesses before they impact production systems.
  • Confidence Building — Regular testing builds confidence in your resilience mechanisms and reduces uncertainty about unknown failures.
  • Continuous Improvement — Ongoing refinement of resilience strategies based on real-world testing results.

Continuous Resilience Validation

Automated Test Recommendations

AWS Resilience Hub suggests fault injection experiments based on your application architecture. With a few clicks, you can:

  • Generate FIS experiment templates directly from Resilience Hub recommendations
  • Execute controlled chaos engineering tests in non-production environments
  • Gradually increase test complexity as confidence grows

Sample Fault injection testing

Image -2 Sample Fault injection testing experiments from Resilience hub application assessment results

Building Confidence Through Testing

The integration creates a continuous improvement cycle:

  1. Assess — Resilience Hub identifies gaps in your resilience posture
  2. Improve — Implement recommended infrastructure changes
  3. Test — Use FIS to validate improvements through controlled failures
  4. Monitor — Track resilience metrics over time
  5. Repeat — Continuously refine based on test results

Continuous Monitoring and Drift Detection

  • Regular Assessments — Schedule automated resilience assessments based on your needs (daily, weekly)
  • Drift Detection — Alert when infrastructure changes negatively impact resilience
  • Compliance Tracking — Monitor resilience posture over time with historical trending

Recovering from Unknown Failures with Confidence

The combination of AWS Resilience Hub and AWS FIS addresses a critical challenge: preparing for failures you haven't experienced yet.

1. Reduced Mean Time to Recovery (MTTR)

  • Pre-validated recovery procedures
  • Automated runbooks tested through FIS experiments
  • Team familiarity with failure scenarios

2. Proactive Issue Identification

  • Discover single points of failure before they cause outages
  • Identify configuration gaps that could extend recovery time
  • Understand cross-service dependencies

3. Organizational Readiness

  • Regular game days using FIS experiments
  • Documentation of recovery procedures
  • Clear understanding of team roles during incidents

4. Data-Driven Decision Making

  • Quantifiable resilience metrics
  • Cost-benefit analysis for resilience investments
  • Executive visibility into application resilience posture

Getting Started with AWS Resilience Hub

Prerequisites

  • IAM roles and permissions — Have an IAM role to run Resilience Hub
  • Turn on the scheduled assessment feature — Requires AwsResilienceHubPeriodicAssessmentRole

Step-by-Step Implementation

Week 1–2: Initial Setup

  1. Identify your first business application
  2. Create an application in AWS Resilience Hub
  3. Define your target RPO/RTO requirements
  4. Run your first resilience assessment

Week 3–4: Analysis and Planning

  1. Review assessment findings with stakeholders
  2. Prioritize recommendations based on cost and impact
  3. Create an implementation roadmap
  4. Obtain necessary approvals and budget

Week 5–8: Implementation

  1. Implement high-priority recommendations
  2. Re-run assessments to validate improvements
  3. Document changes and updated procedures
  4. Train teams on new configurations

Week 9–12: Testing and Validation

  1. Design FIS experiments based on Resilience Hub recommendations
  2. Execute tests in non-production environments
  3. Measure actual RPO/RTO vs. targets
  4. Refine and iterate based on results

Ongoing: Continuous Improvement

  1. Schedule regular resilience assessments
  2. Conduct quarterly chaos engineering experiments
  3. Monitor for configuration drift
  4. Update resilience policies as business needs evolve

Note: Implementation and testing time may vary based on individual organization internal processes and approval systems.


Conclusion

AWS Resilience Hub transforms application resilience from a reactive concern to a proactive strategy. By automatically discovering your current RPO/RTO capabilities, providing actionable recommendations with cost transparency, and integrating with AWS Fault Injection Service for continuous validation, it empowers you to:

  • Move from assumptions to evidence-based resilience planning
  • Justify resilience investments with clear cost-benefit analysis
  • Build organizational confidence through regular testing
  • Maintain continuous compliance with business requirements

Don't wait for an outage to discover your application's true resilience capabilities. Start your journey with AWS Resilience Hub today and build the confidence to recover from any failure — known or unknown.


Next Steps

  • Schedule a 30-minute resilience assessment workshop with your team
  • Identify your top 3 business-critical applications for initial assessment

Explore