- Newest
- Most votes
- Most comments
Based on the situation you've described, this requires AWS-side intervention. Unfortunately, you cannot cancel an upgrade or service software update after it has been initiated - you must wait for AWS to resolve the underlying issue before the process can complete.
Since your domain has been stuck in the "UpdatingServiceSoftware" state for an extended period (months) during the shard relocation phase of the blue/green deployment, and you've already verified that:
- The cluster is healthy
- Shards have actually completed copying
- The update is marked as non-cancellable
- You cannot perform any domain configuration updates
This is clearly an issue with the OpenSearch Service control plane not properly detecting that the deployment has completed and failing to proceed to the final stage of deleting the older resources.
The fact that you have 4 nodes instead of your configured 2 nodes confirms that the blue/green deployment never completed its cleanup phase, leaving both the old and new environments running.
You should contact AWS Support to resolve this issue. They will need to:
- Investigate why the deployment state machine failed to progress beyond the shard copying stage
- Manually complete or cancel the stuck deployment
- Clean up the orphaned resources from the incomplete blue/green deployment
- Restore your ability to perform domain configuration updates
Given that you mentioned this appears to be a recurring pattern on this domain, AWS Support should also investigate the root cause to prevent future occurrences of this issue during service software updates.
Sources
Troubleshoot stuck or failed upgrades in OpenSearch Service | AWS re:Post
Relevant content
- AWS OFFICIALUpdated 8 months ago
