Skip to content

OpenSearch Service Software Update Stuck in Initialization State since 24 hours

0

We are facing an issue with our Amazon OpenSearch Service domain.

Details: Domain Name: ghn-anon-elasticsearch Region: Ireland (eu-west-1) Update Trigger Time: Yesterday, 5th May - around 11:30 AM Indian Time

Issue: The service software update has been stuck for over 24 hours with: Domain processing status: Updating service software Configuration change status: Initializing

We are currently unable to perform any configuration changes or operations on the domain as it appears to be locked in this state.

Cluster appears healthy from our checks, but the update is not progressing.

We have also attempted all possible actions via the AWS Console and AWS CLI to cancel or roll back the service software update, but none of these operations are working.

Could you please investigate and help unblock or reset the update process?

Please let us know if any additional details are required.

Thanks, Shubham

asked 8 days ago39 views
1 Answer
0

I understand your OpenSearch Service domain has been stuck in the "Initializing" state during a service software update for over 24 hours, which is preventing you from making any configuration changes.

Unfortunately, once an upgrade or update has been initiated in OpenSearch Service, it cannot be cancelled. OpenSearch Service uses a blue/green deployment process for domain upgrades and updates, and if the process gets stuck, you must wait for AWS to resolve the underlying issue before it can complete.

During the blue/green deployment process, several factors can cause the domain to get stuck in a processing state, including:

  • Issues with shard relocation
  • Large shard sizes
  • High index or search load on the domain
  • Insufficient disk space
  • Reaching shard count quotas

To help monitor the situation while waiting for resolution, you can:

  • Use Amazon CloudWatch to check OpenSearch Service metrics such as "FreeStorageSpace," "ClusterStatus," and "ClusterIndexWritesBlocked"
  • Monitor the shard migration process using the "_cat/recovery" command if accessible

Since you've been stuck for over 24 hours and cannot cancel the operation, I recommend opening a support case with AWS Support immediately. They will need to investigate the underlying issue causing the update to remain stuck and can take the necessary actions to resolve it. When opening the case, provide all the details you've mentioned including the domain name, region, and the exact time the update was initiated.

In the future, to help prevent similar issues, consider initiating updates during periods of low cluster usage and ensuring your domain has sufficient resources to handle the increased load during an update.
Sources
Troubleshoot stuck or failed upgrades in OpenSearch Service | AWS re:Post

answered 8 days ago
EXPERT
reviewed 8 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.