OpenSearch Nodes Disappear, don't drop to zero?

Question

We're having an issue where our UAT environment AWS OpenSearch setup seems to drop nodes occasionally, but doesn't seem to drop to zero.  As it's our testing environment, we only have one node for OpenSearch, but for some reason it'll disappear and then come back with only the `.kibana` index still available, we can then restore from our last snapshot.

I tried to setup an automated task to watch for when Nodes dropped to zero, creating an event bridge alert, but the problem is, it doesn't drop to zero as you can see from the metric snapshot below, so the event bridge alert never triggers.

![Total Nodes disappear](https://repost.aws/media/postImages/original/IM7i8GmTGhSPe8TFk_CqqHVg)

Why doesn't it drop to zero?  How can I automate the restoration when the node eventually comes back?  how can I trigger an alert at the very least when the nodes disappear?

Answer

Hi,

I hope you were able to find a solution to the issue, but if not, I can share some pointers:

1) Since you have a single node cluster, the missing metrics can be logically assumed a drop to zero when the node disappears and there isnt any metric sent to Cloudwatch. I assume with this setup you dont have dedicated master nodes, which would explain why there is a gap in node count metric

2) There is some finite amount of data that a single node can hold and that does not always correspond to the free disk space. There are multiple metrics like the JVM allocation, number of shards etc... On a single node cluster, I have seen node restart and data loss when the JVM Memory pressure goes beyond 75% for a longer duration. Also look for CPU / Memory utilization.

3) There are other metrics also that you can look for like cluster state being non-green. There is a possibility these other alert / metrics may precede the actual time of node restart and provide you a heads up about the issue.

4) You can trigger an alert for missing data to be treated as breaching threshold in Cloudwatch. That should notify you.

--Syd

OpenSearch Nodes Disappear, don't drop to zero?

相关内容