Skip to content

RabbitMQ B/G Deployment feedback

0

So Ive been working on this stale rabbitmq module, and Ive never done a build of a B/G deploy before. This is how I got it set up in relation to our existing architecture. The decisions were made so that running a rabbitmq deploy would only require a commit of the version tag. I decided that storing the active color in SSM params and then using a data clause as the color value for the module allows us to have SSM be the source of true and we can change the color and apply it without committing to the repo or tf state. Its working /fine/ but im wondering if there are improvements to be made, or if I did it way off base.

1. BEFORE (blue=active, green=idle at 0 instances)
   ┌──────┐     ┌──────┐
   │ BLUE │◄──  │ NLB  │    GREEN: 0 instances
   │ 3.11 │     └──────┘
   └──────┘

2. Scale up green with new version
   ┌──────┐     ┌──────┐     ┌──────┐
   │ BLUE │◄──  │ NLB  │     │GREEN │
   │ 3.11 │     └──────┘     │ 3.12 │
   └──────┘                  └──────┘

3. Export definitions from blue, import to green
   (use SSM scripts: export_definitions.sh, import_definitions.sh)

4. Switch active_color to green (SSM param + terraform apply)
   ┌──────┐     ┌──────┐     ┌──────┐
   │ BLUE │     │ NLB  │──►  │GREEN │
   │ 3.11 │     └──────┘     │ 3.12 │
   └──────┘                  └──────┘

5. Verify green is healthy, then scale blue to 0
   ┌──────┐
   │GREEN │◄── NLB     BLUE: 0 instances
   │ 3.12 │
   └──────┘
            ┌────────────▼──────────────▼────────────┐
              │        active_color switch              │
              │  (NLB listener default action)          │
              │                                        │
              │  active_color="blue"  → blue TGs       │
              │  active_color="green" → green TGs      │
              └───────┬───────────────────┬────────────┘
                      │                   │
         ┌────────────▼────────┐ ┌────────▼────────────┐
         │   BLUE Target Groups│ │  GREEN Target Groups │
         │                     │ │                      │
         │ node-b  (:5672)     │ │ node-g  (:5672)      │
         │ mgmt-b  (:15672)    │ │ mgmt-g  (:15672)     │
         └────────────┬────────┘ └────────┬─────────────┘
                      │                   │
         ┌────────────▼────────┐ ┌────────▼─────────────┐
         │   BLUE ASG          │ │   GREEN ASG           │
         │   {name}-blue       │ │   {name}-green        │
         │                     │ │                       │
         │ ┌─────┐┌─────┐┌───┐│ │ ┌─────┐┌─────┐┌─────┐│
         │ │ EC2 ││ EC2 ││EC2││ │ │ EC2 ││ EC2 ││ EC2 ││
         │ │node1││node2││ n3││ │ │node1││node2││ n3  ││
         │ └─────┘└─────┘└───┘│ │ └─────┘└─────┘└─────┘│
         │                     │ │                       │
         │ Cluster via ASG     │ │ Cluster via ASG       │
         │ peer discovery      │ │ peer discovery        │
         └─────────────────────┘ └───────────────────────┘

         ┌─────────────────────────────────────────────┐
         │  SSM Parameter Store                        │
         │  /{name}/RMQ_ACTIVE_COLOR = "blue"|"green"  │
         │  (lifecycle: ignore_changes on value)        │
         └─────────────────────────────────────────────┘
asked 2 months ago42 views
1 Answer
1

Your RabbitMQ blue/green deployment architecture is well-thought-out and follows solid principles. Here's my feedback:

What you're doing well:

  1. SSM as source of truth - Using SSM Parameter Store to control the active color is a smart approach. It decouples the deployment state from your Terraform state, allowing traffic switching without code commits.

  2. Zero-instance idle environment - Scaling the idle environment to 0 instances is cost-effective and makes sense for RabbitMQ where you're not serving stateless requests.

  3. NLB listener routing - Using the NLB's listener default action to route based on active color provides a clean traffic switching mechanism.

  4. Definition export/import - Your approach to migrate queue definitions, exchanges, and bindings between environments is the right way to handle RabbitMQ's stateful nature.

Potential improvements to consider:

  1. Validation before switchover - Before switching the active color in SSM, ensure you have automated health checks and smoke tests running against the green environment. This could include verifying queue connectivity, message flow, and cluster health.

  2. Gradual traffic shifting - Instead of an immediate cutover, consider implementing weighted target groups or a canary approach where you can send a small percentage of traffic to green first, monitor for issues, then complete the switch.

  3. Rollback automation - Document or script the rollback procedure. If issues arise after switching to green, you want a quick, tested way to revert the SSM parameter and scale blue back up.

  4. Monitoring during transition - Implement comprehensive monitoring during the switchover period. Track message rates, connection counts, and error rates on both environments to catch issues quickly.

  5. Connection draining - When scaling down the old environment, ensure proper connection draining so existing client connections aren't abruptly terminated. RabbitMQ clients should be configured to handle reconnection gracefully.

  6. State synchronization window - Consider the timing between exporting definitions from blue and importing to green. If there's active traffic, you might miss newly created queues or bindings. You may want to implement a brief traffic pause or have a process to sync any delta.

Your architecture isn't "way off base" at all - it's a practical implementation that balances automation, cost, and operational simplicity. The main areas to enhance would be around validation, monitoring, and graceful transitions to further reduce risk during deployments.
Sources
Zero downtime blue/green deployments with Amazon API Gateway | AWS Compute Blog
Community | Levelling up Your Releases: Reduce Risk with Blue/Green Deployments

answered 2 months ago
EXPERT
reviewed 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.