ElasticSearch Domain not responsive anymore

0

Our ElasticSearch Domain unexpectedly stopped responding. Since we cannot look into the logs anymore we do not have the chance to find out the reason for that. We noticed that the ebs volume appeared to be at 3GB size, however we cannot verify if this indeed is the reason why the elasticsearch domain is not responsive anymore. We also do not see any possibility to find out why this happend (as mentioned above, logs are not accessible).

We started an update of the ebs volume to double in size using the AWS webconsole, as we expected that new space might help. The update was marked as 'green' so we agreed on starting it. Even after 24 hours the update is stuck in the state 'Processing'. As described in the documentation, this means that the update of the ebs volume failed (https://aws.amazon.com/premiumsupport/knowledge-center/opensearch-domain-stuck-processing/).

We have found a long list of reasons for why this update may have failed:

  • CDI Failure: creation of new DI failed
  • Customer makes simultaneous configuration changes on the cluster
  • Failed CDI due to overloaded mastr node or any other activity failures like insufficient IP addresses in the customer subnet
  • Nodes went out of service due to heavy processing load or internal hardware failure
  • Previous DDI Failure: New CDI executed before Previous DDI activities are completed.
  • Large number of shards and continues node failure due to high JVM memory Pressure and CPU Usage.
  • Nodes went out of service due to heavy processing load or internal hardware failure
  • Stuck shard relocation due to insufficient free storage in the new DI, custom shard routing and service/es issues
  • Single domain can usually have 2 DIs at maximum. If CDI happens before previous DDI is completed, new CDI is queued but won't be executed.

But this list does not help to solve the issue that our ElasticSearch Domain is currently in a state that we cannot change. The only action that we see is deleting the domain and start a new one. However we would like to ask if there is some action that AWS can do for bringing the domain back to life. We need the logs that are within the domain.

질문됨 2년 전342회 조회
1개 답변
0

Hi,

AWS support could definitely help you both to retrieve the logs, understand the root cause of your issue, and possibly fix the existing cluster.

Your best next steps would be to open a Support case, in case you have not yet a Support subscription, consider to upgrade to Developer Support or Business support depending on the urgency of your case.

Without looking at the internal logs or having more detailed information on the situation it is not possible to provide ANY OTHER guidance, sorry.

AWS
전문가
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠