How to combine ISM policy actions to control shard size

所要時間5分
コンテンツレベル: エキスパート
0

Optimizing index shard size is very important to optimize performance and stability in Amazon OpenSearch Service . Several customers has production indices with very high ingestion rate and there is a requirement to control shard sizes using more than one condition in their ISM policy and there seems no such article with relevant example found. This article should help such customers to quickly adopt this solution.

OpenSearch is one of the key AWS service used in operational analytics to get near realtime analytics insight into streaming data from various sources like IoT devices used in manufacturing environments, charging stations in electric automobile eco systems, click stream data from retail/e-commerce websites and security log streams used for realtime threat detection. Optimum index shard sing is a key factor to achieve stable operation and maximum performance in AWS OpenSearch domains.

The best shard sizing strategy in OpenSearch is to set the primary shard size between 10–30 GiB (for search workloads), or between 30–50 GiB (for logs workloads). 50 GiB should be the maximum—be sure to plan for growth. ISM policy can be used to automatically achieve this shard sizing criteria. A common best practice is to use an index alias for indexing(write) and set the ISM policy to automatically rollover the alias to a new index based on age of the index(Ex: 1 hour or 1 day). Below given is such an ISM rollover policy based on index age.

{
  "policy": {
    "description": "Example rollover policy.",
    "default_state": "rollover",
    "states": [
      {
        "name": "rollover",
        "actions": [
          {
            "rollover": {
              "min_index_age": "1h"
            }
          }
        ],
        "transitions": []
      }
    ],
    "ism_template": {
      "index_patterns": ["roll_index*"],
      "priority": 1
    }
  }
}

However, in OpenSearch deployments where there is sudden spike in data ingestion rate with non-uniform and unpredictable document size, the index/shard size can reach above mentioned recommended max size of 30GB or 50GB way before reaching the ISM policy’s index age condition(ex: 1 hour ) setting for the auto rollover action, causing very large and irregular index/shard sizes (as shown below), this can lead to storage skew in data nodes and associated performance issues.

GET /_cat/indices?v&h=index,pri,rep,docs.count,store.size,pri.store.size,creation.date.string&s=store.size:desc

index              pri rep docs.count stor.size pri.store.size creation.date.string
roll_index-000005  5   1    35589885   960gb     480gb          2024-09-20T12:01:57.490Z
roll_index-000004  5   1    20088976   751gb     375.5gb        2024-09-20T11:02:57.490Z
roll_index-000001  5   1    21575847   699gb     349.5gb        2024-09-20T10:00:57.490Z
roll_index-000003  5   1    16767899   678gb     339gb          2024-09-20T09:03:57.490Z
roll_index-000002  5   1    18698153   562gb     281gb          2024-09-20T08:01:57.490Z

The largest index “roll_index-000005” from above screenshot created at 12PM on 20th Sep 2024 has 480gb of total primary storage spanning across 5 primary shards, this makes 96gb size for each primary shard (480gb/5 shards), but other indices created during other hours(8AM to 11AM) of the same day has different index and shard sizes, this could be due to unpredicted source data surge causing sudden spike in high indexing rate during the 12th hour before meeting the ISM policy condition("min_index_age": "1h") . To address this issue, we can combine multiple conditions in the ISM policy action definition which can perform the auto rollover to a new index when any of the conditions met. Below given is such an ISM policy, where the shard size threshold is set as 40GB and index age threshold is set as 1 hour, so that the index alias will auto rollover to a new index when either of the conditions are met, making sure the shard size will not overgrows the recommend max size.

{
  "policy": {
    "description": "Example rollover policy.",
    "default_state": "rollover",
    "states": [
      {
        "name": "rollover",
        "actions": [
          {
            "rollover": {
              "min_primary_shard_size": "40gb",
              "min_index_age": "1h"
            }
          }
        ],
        "transitions": []
      }
    ],
    "ism_template": {
      "index_patterns": ["roll_index*"],
      "priority": 1
    }
  }
}

Below given are the step by step procedure to implement the automatic rollover ISM policy.

  1. Create an index template to specifying the rollover alias name (ex: roll_index).

PUT _index_template/ism_rollover_template_1

{
  "index_patterns": ["roll_index*"],
  "template": {
    "settings": {
      "index.refresh_interval": "30s",
      "index.number_of_shards": "1",
      "index.number_of_replicas": "1",
      "index.plugins.index_state_management.rollover_alias": "roll_index"
    }
  }
}
  1. Create an ISM template specifying both index age and shard size rollover conditions and index template to use.
PUT _plugins/_ism/policies/rollover_policy_1
{
  "policy": {
    "description": "Example rollover policy.",
    "default_state": "rollover",
    "states": [
      {
        "name": "rollover",
        "actions": [
          {
            "rollover": {
              "min_primary_shard_size": "40gb",
              "min_index_age": "1h"
            }
          }
        ],
        "transitions": []
      }
    ],
    "ism_template": {
      "index_patterns": ["roll_index*"],
      "priority": 1
    }
  }
}

  1. Create the rollover index and associated rollover alias (roll_index).
PUT roll_index-000001
{
  "aliases": {
    "roll_index": {
      "is_write_index": true
    }
  }
}
  1. Validate the index and shard size. From below output we can see that multiple indices are created within the same hour(10th & 11th hours) when the shard size reaches the policy condition of 40GB (ex: 204gb/5primary shards = 40.8gb). Since the ISM plugin validates the policy conditions every 5 minutes, the shard size may not be the exact value set in the policy as mentioned in OpenSearch rollover documentation

GET /_cat/indices?v&h=index,pri,rep,docs.count,store.size,pri.store.size,creation.date.string&s=store.size:desc

index              pri rep docs.count stor.size pri.store.size  creation.date.string
roll_index-000006  5   1    35589885   408gb     204gb            2024-09-22T10:10:58.490Z
roll_index-000008  5   1    20088976   406gb     203gb            2024-09-22T10:21:58.490Z
roll_index-000009  5   1    21575847   410gb     205gb            2024-09-22T10:36:58.490Z
roll_index-000010  5   1    16767899   409gb     204.5gb          2024-09-22T11:02:57.490Z
roll_index-000011  5   1    18698153   407gb     203.5gb          2024-09-22T11:26:57.490Z

Note:- The automatic rollover feature will be creating a new index whenever an ISM policy condition is met, this lead to corresponding increase in the number of shards in the domain. Similar to controlling shard size, it is also important to control the number of shards in the domain to achieve stability and performance, the total number of shards that a data node can hold is proportional to the node’s Java virtual machine (JVM) heap memory, also there is a limit of 1,000 shards per node in Amazon OpenSearch service. ISM policy can be used to move older indexes into a different storage tier, delete when no longer needed or several indexes can be rolled up into an aggregate index

コメントはありません

関連するコンテンツ