How to combine ISM policy actions to control shard size
Optimizing index shard size is very important to optimize performance and stability in Amazon OpenSearch Service . Several customers has production indices with very high ingestion rate and there is a requirement to control shard sizes using more than one condition in their ISM policy and there seems no such article with relevant example found. This article should help such customers to quickly adopt this solution.
OpenSearch is one of the key AWS service used in operational analytics to get near realtime analytics insight into streaming data from various sources like IoT devices used in manufacturing environments, charging stations in electric automobile eco systems, click stream data from retail/e-commerce websites and security log streams used for realtime threat detection. Optimum index shard sing is a key factor to achieve stable operation and maximum performance in AWS OpenSearch domains.
The best shard sizing strategy in OpenSearch is to set the primary shard size between 10–30 GiB (for search workloads), or between 30–50 GiB (for logs workloads). 50 GiB should be the maximum—be sure to plan for growth. ISM policy can be used to automatically achieve this shard sizing criteria. A common best practice is to use an index alias for indexing(write) and set the ISM policy to automatically rollover the alias to a new index based on age of the index(Ex: 1 hour or 1 day). Below given is such an ISM rollover policy based on index age.
{
"policy": {
"description": "Example rollover policy.",
"default_state": "rollover",
"states": [
{
"name": "rollover",
"actions": [
{
"rollover": {
"min_index_age": "1h"
}
}
],
"transitions": []
}
],
"ism_template": {
"index_patterns": ["roll_index*"],
"priority": 1
}
}
}
However, in OpenSearch deployments where there is sudden spike in data ingestion rate with non-uniform and unpredictable document size, the index/shard size can reach above mentioned recommended max size of 30GB or 50GB way before reaching the ISM policy’s index age condition(ex: 1 hour ) setting for the auto rollover action, causing very large and irregular index/shard sizes (as shown below), this can lead to storage skew in data nodes and associated performance issues.
GET /_cat/indices?v&h=index,pri,rep,docs.count,store.size,pri.store.size,creation.date.string&s=store.size:desc
index pri rep docs.count stor.size pri.store.size creation.date.string
roll_index-000005 5 1 35589885 960gb 480gb 2024-09-20T12:01:57.490Z
roll_index-000004 5 1 20088976 751gb 375.5gb 2024-09-20T11:02:57.490Z
roll_index-000001 5 1 21575847 699gb 349.5gb 2024-09-20T10:00:57.490Z
roll_index-000003 5 1 16767899 678gb 339gb 2024-09-20T09:03:57.490Z
roll_index-000002 5 1 18698153 562gb 281gb 2024-09-20T08:01:57.490Z
The largest index “roll_index-000005” from above screenshot created at 12PM on 20th Sep 2024 has 480gb of total primary storage spanning across 5 primary shards, this makes 96gb size for each primary shard (480gb/5 shards), but other indices created during other hours(8AM to 11AM) of the same day has different index and shard sizes, this could be due to unpredicted source data surge causing sudden spike in high indexing rate during the 12th hour before meeting the ISM policy condition("min_index_age": "1h") . To address this issue, we can combine multiple conditions in the ISM policy action definition which can perform the auto rollover to a new index when any of the conditions met. Below given is such an ISM policy, where the shard size threshold is set as 40GB and index age threshold is set as 1 hour, so that the index alias will auto rollover to a new index when either of the conditions are met, making sure the shard size will not overgrows the recommend max size.
{
"policy": {
"description": "Example rollover policy.",
"default_state": "rollover",
"states": [
{
"name": "rollover",
"actions": [
{
"rollover": {
"min_primary_shard_size": "40gb",
"min_index_age": "1h"
}
}
],
"transitions": []
}
],
"ism_template": {
"index_patterns": ["roll_index*"],
"priority": 1
}
}
}
Below given are the step by step procedure to implement the automatic rollover ISM policy.
- Create an index template to specifying the rollover alias name (ex: roll_index).
PUT _index_template/ism_rollover_template_1
{
"index_patterns": ["roll_index*"],
"template": {
"settings": {
"index.refresh_interval": "30s",
"index.number_of_shards": "1",
"index.number_of_replicas": "1",
"index.plugins.index_state_management.rollover_alias": "roll_index"
}
}
}
- Create an ISM template specifying both index age and shard size rollover conditions and index template to use.
PUT _plugins/_ism/policies/rollover_policy_1
{
"policy": {
"description": "Example rollover policy.",
"default_state": "rollover",
"states": [
{
"name": "rollover",
"actions": [
{
"rollover": {
"min_primary_shard_size": "40gb",
"min_index_age": "1h"
}
}
],
"transitions": []
}
],
"ism_template": {
"index_patterns": ["roll_index*"],
"priority": 1
}
}
}
- Create the rollover index and associated rollover alias (roll_index).
PUT roll_index-000001
{
"aliases": {
"roll_index": {
"is_write_index": true
}
}
}
- Validate the index and shard size. From below output we can see that multiple indices are created within the same hour(10th & 11th hours) when the shard size reaches the policy condition of 40GB (ex: 204gb/5primary shards = 40.8gb). Since the ISM plugin validates the policy conditions every 5 minutes, the shard size may not be the exact value set in the policy as mentioned in OpenSearch rollover documentation
GET /_cat/indices?v&h=index,pri,rep,docs.count,store.size,pri.store.size,creation.date.string&s=store.size:desc
index pri rep docs.count stor.size pri.store.size creation.date.string
roll_index-000006 5 1 35589885 408gb 204gb 2024-09-22T10:10:58.490Z
roll_index-000008 5 1 20088976 406gb 203gb 2024-09-22T10:21:58.490Z
roll_index-000009 5 1 21575847 410gb 205gb 2024-09-22T10:36:58.490Z
roll_index-000010 5 1 16767899 409gb 204.5gb 2024-09-22T11:02:57.490Z
roll_index-000011 5 1 18698153 407gb 203.5gb 2024-09-22T11:26:57.490Z
Note:- The automatic rollover feature will be creating a new index whenever an ISM policy condition is met, this lead to corresponding increase in the number of shards in the domain. Similar to controlling shard size, it is also important to control the number of shards in the domain to achieve stability and performance, the total number of shards that a data node can hold is proportional to the node’s Java virtual machine (JVM) heap memory, also there is a limit of 1,000 shards per node in Amazon OpenSearch service. ISM policy can be used to move older indexes into a different storage tier, delete when no longer needed or several indexes can be rolled up into an aggregate index