how to configure/calculate shards and data nodes in an open search cluster?

0

I have created an aws OpenSearch cluster for testing purposes. I have configured it to one data node as this is for testing/poc. based on the 1 data node, I have set number_of_shards to 1 and replicate to 1. my question is , how many number of shards can I set , if I have just one data node?

PUT /awesome_index
        {
          "settings": {
            "index.knn": true
            "number_of_shards": 1,
             "number_of_replicas": 1
          },
          "mappings": {
            "properties": {
              "vector_field": {
                "type": "knn_vector",
                "dimension": 1024,  
                "method": "hnsw"   
              }
            }
          }
        }
asked a month ago23 views
1 Answer
0

An index in OpenSearch is divided into multiple primary shards, and you can have primary shard count and replica shard count set separately.

For example:

  • A shard strategy of 1:1 (primary:replica), is that each primary shard will have one its replica.
  • A shard strategy of 5:2, is that each primary will have 2 replicas. To illustrate this, it will look like:
Primary A, replica A1, replica A2,
Primary B, replica B1, replica B2,
Primary C, replica C1, replica C2,
Primary D, replica D1, replica D2,
Primary E, replica E1, replica E2,

AWS OpenSearch use default shard strategy of 5:1.

So to answer your question:

how many number of shards can I set , if I have just one data node?

Technically, you can set any number of shards, if your node has enough memory. To exaggerate, you can set shard strategy of 100:2.

However, things that you need to take into account are:

  1. Because the shard metadata is kept in the Java Virtual Memory (JVM), and if you have way more shards, you will have high JVM Memory Pressure. This will cause issue for your cluster. Worst case, your node will drop. It's recommend to keep it under 75%.
  2. Cluster health: if you have single node and you create an index with shard strategy of 1:1, based on OpenSearch distribute nature, it will distribute primary shard and its replicas to different data node. However, since you only have 1 data node, so a single-node cluster is always in Yellow status when you start ingesting data. You may bypass this by shard strategy of 1:0. However, this is only for development. You will want high availability for your cluster.
profile pictureAWS
SUPPORT ENGINEER
answered a month ago
  • @Derry_Yeh- thank you very much , appreciate it. i tried 1:0 strategy and got the green status. this is for development. as far as i know, opensearch takes snapshots too. so the risk with one data node, with 1 shard and 0 replicas is , that if the data node is not available, we wont' be able to search? is that a correct assumption?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions