- 최신
- 최다 투표
- 가장 많은 댓글
Based on your description, your OpenSearch cluster is experiencing a serious issue with 0 active data nodes, unresponsiveness, and a configuration change stuck at 20% for over 14 hours.
This is a common problem that can occur with Amazon OpenSearch Service. When configuration changes get stuck in the "Processing" or "Modifying" state (showing as 20% complete in your case), it typically indicates an underlying issue with the cluster.
Here are some steps to address this:
-
Check CloudWatch metrics for your domain to identify potential causes like storage issues or high CPU usage. These metrics might reveal if your nodes were experiencing resource constraints before they became unresponsive.
-
Review the AWS knowledge center article about OpenSearch domains stuck in the processing state, which provides specific troubleshooting steps for this scenario.
-
If the above doesn't help, you'll need to create a ticket with AWS Technical Support. This is especially important if you need to recover data from the unresponsive cluster.
-
As a last resort, some users have resolved similar issues by creating a new OpenSearch instance and deleting the stuck one. However, this approach would result in data loss if you haven't backed up your indices.
The fact that you tried to increase data nodes from 1 to 2 during an already problematic state might have complicated the situation, as making configuration changes to an unhealthy cluster can sometimes exacerbate issues.
Sources
OpenSearch Configuration Change Stuck at 20% and 503 Error Responses | AWS re:Post
Amazon OpenSearch Service Stuck at Modifying | AWS re:Post
Practically, you should not make your data nodes as Zero. Unlike other AWS server cluster where you can make the nodes as zero and make the cluster working doesn't work in AWS OSS. I have personally experienced that. There is no specific documentation available but this is a experience from actual scenario.

The same. Two OpenSearch clusters in two projects. Region eu-central-1 - some breakdowns and issues today, region eu-west-1 - huge problem for several hours - cluster's health is "red", 1 node active, stuck applying changes. Dashboards are unavailable - 502 Gateway Error.