Distribute data on OpenSearch Data Nodes

0

Hello Everyone,

I have a small query related to AWS OpenSearch. We have create a OpenSearch Cluster with 4 Data Nodes and 3 Master Nodes with 2500GB Volume size. Now the issue is 1 data node shows available space is 232 GB and another 3 Data Nodes shows available space is 1345GB. 1329GB,1325GB. As per my understanding Data node 1 store more data as comparison to other Data Nodes. So is there any way that we can split equal amount of data among all the Data Nodes..?

1 Answer
0

This is a highly unusual distribution since Opensearch does a pretty good job with evenly distributing the data across the nodes. Check for the cluster state to identify if any shards are unallocated, if shards are evenly sized, if shard allocation is disabled or any nodes are excluded from allocation (maybe accidentally). You also have the option of moving shards though this should be your last resort.

You can also refer to this document - https://aws.amazon.com/premiumsupport/knowledge-center/opensearch-rebalance-uneven-shards/ and some related links within the document

--Syd

profile picture
Syd
answered 2 years ago
  • Thanks for the suggestions I have investigated this issue and found that 1 data node store two times of same indecs due to that it's available space in 232GB.

    When i check indecs distribution to nodes i got below results

    ================================================================================= testing_index 1 p STARTED 53060777 1tb x.x.x.x 3ca476458dc48402b73b52f20b9b6be4 testing_index 3 p STARTED 53076189 1tb x.x.x.x f0dea9df2df7c2c1a151502fafd29e28 testing_index 4 p STARTED 53085330 1tb x.x.x.x f0dea9df2df7c2c1a151502fafd29e28 testing_index 2 p STARTED 53078362 1tb x.x.x.x 2b9c9b54732795e41836ab950066813a testing_index 0 p STARTED 53070442 1tb x.x.x.x 42e5f2d7ba49896f5b13c0c458d0d47c

  • Your index is configured to have 5 shards where as you have only 4 data nodes. So one of the data nodes would have more data than the other since it has to hold the extra shard. The simplest thing to do is add a data node. Also from the info you shared I sense there is only one index with no replicas and each shard being 1TB in size. This shard size is way too large (50Gb is max recommended by AWS) and would affect the ability of the cluster to recover during any data nodes outages. Depending on your use case you should spilt the data into multiple shards / indices. Refer to this document https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp.html especially the Shard Strategy section

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions