- Newest
- Most votes
- Most comments
I understand your OpenSearch Service domain is stuck in a blue/green deployment with 6 nodes running instead of the target 2 nodes, even though shard migration appears complete and the cluster is healthy.
This situation occurs when the domain enters a "Processing" or "Modifying" state during configuration changes. During blue/green deployments, OpenSearch Service creates a new environment with new nodes and migrates shards from the old nodes to the new ones. The domain can get stuck in this state if:
- The shard relocation process encounters issues, even if shards appear healthy
- Validation checks fail with errors
- The system is unable to properly terminate the old nodes after migration
Several factors can cause shard relocation to stall during blue/green deployments:
- Large shard sizes (over 50 GB) that increase copy time
- High index or search load causing elevated resource usage
- Insufficient disk space on new nodes
- Reaching shard count quotas
To help diagnose the root cause, I recommend:
- Monitor CloudWatch metrics for your domain, specifically:
- FreeStorageSpace
- ClusterStatus
- ClusterIndexWritesBlocked
- CPUUtilization
- JVMMemoryPressure
-
Check the domain description in the console or via API for any failed activities or validation-related error messages
-
Monitor shard migration using the
_cat/recoveryAPI to see if any recovery operations are still in progress
Since you've confirmed all shards are STARTED and there are no pending tasks, this suggests the migration completed but the cleanup phase failed to proceed. Unfortunately, I cannot manually intervene to complete the deployment cleanup, as this requires AWS Support access to your account's internal systems.
I strongly recommend opening an AWS Support case with the details you've provided (domain name, region, account ID, and Change ID). The support team can investigate the stuck deployment state and safely complete the cleanup process to remove the old nodes and finalize the configuration change.
In the meantime, avoid making additional configuration changes to the domain, as this could complicate the current stuck state.
Sources
Troubleshooting Amazon OpenSearch Service - Amazon OpenSearch Service
Troubleshoot stuck or failed upgrades in OpenSearch Service | AWS re:Post
Relevant content
- asked 6 months ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 9 months ago

Regarding the former answer of the re:Post Agent, it is accurate. However, since the cluster state is green and shards are internally reported as
STARTED, this is clearly a stuck workflow in the AWS Control Plane. The suggested CloudWatch metrics are helpful for verification, but as the re:Post Agent noted, only AWS Support can manually clear the Change ID and terminate the legacy nodes.