By using AWS re:Post, you agree to the Terms of Use

AWS ElasticSearch Stuck Upgrade processing after 3.5 hours


I am trying to upgrade 7.1 to 7.4 and my ElasticSearch service is stuck. What can I do?

I did do the check and that did pass just fine.

Upgrade from 7.1 to 7.4 - 4/9/2020, 2:41:17 AM - In Progress
Checking upgrade eligibility - 50% completed
Pre-Upgrade Check from 7.1 to 7.4 - 4/9/2020, 2:40:34 AM - Succeeded
Checking upgrade eligibility - Succeeded

Also something is going wrong with my nodes.
Overview shows my intended "Number of nodes" = 5
Cluster health is showing "Number of nodes" = 20

Since I don't have any control of the instances I think I am stuck. I do not know what to do.

asked 2 years ago83 views
1 Answer

I wanted to share that AWS support did help resolve this issue. Below is the information that I received, hopefully it may help somebody else who lands here based on search.

AWS Support:

In order to help you with this issue, I have already reached out to the internal team with my highest priority. Please be assured that any information obtained from them would be conveyed to you at the earliest.

AWS Support:

I received an update from the internal team and they have conveyed to me that the domain is out of processing state.

The internal team also conveyed to me that the cluster was stuck in the upgrading process checking for over 17 hours as there were internal workflow failures.

Also, as you may be aware, whenever a configuration change is being-made on the cluster, a blue Green-deployment takes place where an entire new set of nodes are brought up and the data from the old nodes to the new nodes are transferred. This is when the node count increases, and shows more than the actual number of configured nodes. This is the reason as to why there were 20 nodes in the cluster prior to when the configuration changes completed. Please refer to the below documentation for more information on the same:
[] Managing Amazon Elasticsearch Service Domains - About Configuration Changes -

Also, as stated by the internal team, as there were internal workflow failures due to which the cluster configuration changes did not go through and the domain seemed to be stuck in processing. In-order to help resolve this issue, the internal team had to perform multiple manual retries from their end to make the cluster healthy and get the cluster to the desired number of configured nodes.


Further from the backend I can see that your cluster has no dedicated master nodes. According to the best practices of AWS ES, it is recommended to have dedicated master nodes.

  • Having dedicated master nodes:
It is highly recommended to have a dedicated master node for an ES domain to improve the stability of your domain. A dedicated master node does not hold any data or respond to the data upload requests but it performs the cluster management tasks. This in-turn increases the stability of your domain. It is recommended to have a minimum of 3 dedicated master nodes.   
Please refer the following documentation for more information about the importance and requirement of a dedicated master node.   
\[1] Amazon Elasticsearch Service Best Practices -   
I hope the above information is helpful. Please feel free to update on the case correspondence if you have any further queries and I would be glad to assist you further.  
My closing thoughts:  
I appreciate the support and happy that the issue was resolved. Still, getting this help did require that I purchase a support plan. I'm asking for clarification if I did anything to cause the failure, since right now it has been explained as "internal workflow failures".
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions