- Newest
- Most votes
- Most comments
As to my understanding, the previous recommendation overlooks a critical detail in your metrics:
your cluster has Nodes=1. Because this is a Single-Node setup, administrative node restarts will not work (they require > 3 nodes), and the SSM runbook will fail because Port 443 is completely dead.
"DP Nodes are OOS" on a single node means the underlying instance or Java process crashed completely (likely due to an Out-of-Memory event). Since the domain state is still "Active", your only self-service option is to force AWS to provision a new underlying host by triggering a Blue/Green deployment.
Try applying one of these configuration changes via the Console or AWS CLI:
-
Instance Type Modification: Change the instance type minimally (e.g., from t3.medium.search to another supported instance family/size and back).
-
Storage Adjustment: Increase the EBS volume size by a few gigabytes to force a volume modification and host re-evaluation.
-
Toggle Dedicated Master Nodes: Temporarily enable Dedicated Master nodes (or toggle them off if enabled) to fundamentally shake up and rebuild the cluster topology.
Note: If the AWS Control Plane blocks these modifications with a 400 Bad Request or an internal error due to the OOS state, you cannot recover this yourself. If upgrading to a paid support plan is absolutely off the table, your only option is to delete and recreate the domain from a snapshot, or wait and hope that the automated AWS hypervisor health checks eventually cycle the unresponsive hardware.
When your OpenSearch domain shows "DP Nodes are OOS" (Data Plane Nodes are Out of Service) and the cluster is completely unresponsive with connection refused errors, this indicates a critical issue where the OpenSearch process on your data nodes has stopped or failed.
Here are some steps you can take:
Immediate Actions:
-
Restart the OpenSearch Process: Amazon OpenSearch Service provides administrative options to restart the OpenSearch process on a data node. You can perform this action through the AWS Management Console, AWS CLI, or AWS SDK. However, note that your domain must not be in a red state and all indexes must have replicas configured before initiating a restart.
-
Restart the Data Node: If restarting the process doesn't resolve the issue, you can restart the data node itself. This is a more aggressive action but may be necessary when the OpenSearch process is completely unresponsive.
Important Prerequisites:
- Your domain needs at least three or more data nodes for process and node restarts to be supported
- Multi-AZ with Standby configurations are not supported for these administrative actions
- These options are available for Elasticsearch versions 7.x and higher
Diagnostic Approach:
You can use the AWSSupport-TroubleshootOpenSearchRedYellowCluster automation runbook through AWS Systems Manager to help diagnose the issue. This runbook can identify causes of cluster health problems and provide recommended resolution steps. It works by creating a Lambda function that runs OpenSearch APIs against your cluster to determine the root cause.
If Administrative Actions Don't Work:
Given that your domain shows as "Active" but is completely unresponsive, and rollback has failed, this suggests a severe underlying issue. The "DP Nodes are OOS" message combined with connection refused errors typically indicates the data plane is not functioning at all.
Unfortunately, when administrative restart options are unavailable or ineffective, and the domain is in this critical state, you may need to work with AWS Support through the AWS Support Center (https://console.aws.amazon.com/support/) to add correspondence about your situation. You might also want to review your AWS Support plan options, as higher-tier plans provide faster response times for production-impacting issues like this.
Prevention for Future:
- Ensure proper replica configuration for all indexes
- Maintain adequate cluster capacity
- Monitor CloudWatch metrics for early warning signs
- Consider Multi-AZ deployments for better resilience
Sources
Performing administrative actions on Amazon OpenSearch Service domains - Amazon OpenSearch Service
AWSSupport-TroubleshootOpenSearchRedYellowCluster - AWS Systems Manager Automation Runbook Reference
Relevant content
- asked 2 years ago
- AWS OFFICIALUpdated a year ago
