What could I do when the amazon opensearch instance is stuck in update for more than a week?
Service Name: Amazon OpenSearch Service
Version Name: Elasticsearch 6.3
Service software version: updating to R20211203-P2 (latest)
Start time: 16/12/2021, 17:11:14 (UTC), checked from the Notifications tab
Cluster Status: 16-17/12 are green, then red from 18-29/12, I have then restored it from the healthy automated snapshot, becomes green again until now
Instance Status: processing
Data node: 2 (desired to be 1)
What could I do when I have such situation? The status is stuck in processing.
Any method to handle it as I would like to perform version update from 6.3 to 6.8 and to opensearch?
Remarks:
I have another production instance that finished the service software update within 1 hour.
Good question. Sorry to hear about your issues!
AWS recommends resolving the red cluster status first before reconfiguring your OpenSearch Service domain. Typically a red cluster status means 1 primary shard and the replicas are not allocated. OpenSearch will take snapshots but these fail when the red cluster status persists.
2 of the most common causes are failed cluster nodes and the process crashing from a heavy load. The following can help identify the issue:
- GET /_cluster/allocation/explain
- GET /_cat/indices?v
From that info, you can try the following:
- Delete the problematic index
- Restore a snapshot
- Delete documents from the index
- Change the index settings
- Reduce the number of replicas
- Delete other indices to free up disk space.
For failed cluster nodes or more complex issues that aren't solvable by the above, you can reach out to AWS Support for more help as detailed here: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/handling-errors.html
curl -XGET 'https://xx/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana xx 1 1 2 0 18.7kb 9.3kb
green open book xx 5 0 52598 150 24.1mb 24.1mbcurl -XGET 'https://xx/_cluster/allocation/explain?pretty' "reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequestuseAnyUnassignedShard=true,includeYesDecisions?=false"
For both suggested methods, they both seem normal at this moment, dunno why it's still stuck in processing
Relevant questions
Throttling a Client/Index from OpenSearch (ElasticSearch)
Accepted Answerasked 5 months agoWhat's the log4j version on R20211203- P2?
asked 5 months agoImpact to AWS Elasticsearch on the licensing change of Elasticsearch(elastic.co)
Accepted Answerasked a year agoAWS OpenSearch Service got error after enable auto-tune
asked 13 days agoOpenSearch Cluster max clause count not updating
asked 2 months agoWhat features of Open Distro is supported by ElasticSearch?
Accepted AnswerElasticache for Redis cluster service update got stuck
asked 3 months agoWhat could I do when the amazon opensearch instance is stuck in update for more than a week?
asked 5 months agoR20210426-P2 software update - does that upgrade ElascticSearch version?
Accepted Answerasked a year agoCannot upgrade OpenSearch service for unknown reason
Accepted Answerasked 5 months ago
Follow up on my question: I finially solved it by creating a new instance of same elastic search version. Then recover the snapshot from the old one to the new one and remove the old one (which was stuck in processing) It is fine to remove a processing elastic search instance at last. The only change is the name of the instance ( you might need to recreate it again to use back the old one's name)