ElasticSearch Domain not responsive anymore

0

Our ElasticSearch Domain unexpectedly stopped responding. Since we cannot look into the logs anymore we do not have the chance to find out the reason for that. We noticed that the ebs volume appeared to be at 3GB size, however we cannot verify if this indeed is the reason why the elasticsearch domain is not responsive anymore. We also do not see any possibility to find out why this happend (as mentioned above, logs are not accessible).

We started an update of the ebs volume to double in size using the AWS webconsole, as we expected that new space might help. The update was marked as 'green' so we agreed on starting it. Even after 24 hours the update is stuck in the state 'Processing'. As described in the documentation, this means that the update of the ebs volume failed (https://aws.amazon.com/premiumsupport/knowledge-center/opensearch-domain-stuck-processing/).

We have found a long list of reasons for why this update may have failed:

  • CDI Failure: creation of new DI failed
  • Customer makes simultaneous configuration changes on the cluster
  • Failed CDI due to overloaded mastr node or any other activity failures like insufficient IP addresses in the customer subnet
  • Nodes went out of service due to heavy processing load or internal hardware failure
  • Previous DDI Failure: New CDI executed before Previous DDI activities are completed.
  • Large number of shards and continues node failure due to high JVM memory Pressure and CPU Usage.
  • Nodes went out of service due to heavy processing load or internal hardware failure
  • Stuck shard relocation due to insufficient free storage in the new DI, custom shard routing and service/es issues
  • Single domain can usually have 2 DIs at maximum. If CDI happens before previous DDI is completed, new CDI is queued but won't be executed.

But this list does not help to solve the issue that our ElasticSearch Domain is currently in a state that we cannot change. The only action that we see is deleting the domain and start a new one. However we would like to ask if there is some action that AWS can do for bringing the domain back to life. We need the logs that are within the domain.

asked 2 years ago337 views
1 Answer
0

Hi,

AWS support could definitely help you both to retrieve the logs, understand the root cause of your issue, and possibly fix the existing cluster.

Your best next steps would be to open a Support case, in case you have not yet a Support subscription, consider to upgrade to Developer Support or Business support depending on the urgency of your case.

Without looking at the internal logs or having more detailed information on the situation it is not possible to provide ANY OTHER guidance, sorry.

AWS
EXPERT
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions