Skip to content

How to decide whether to use a 1:1, 3:1, or 6:1 sharding strategy for a OpenSearch cluster of 3 data nodes?

5 minute read
Content level: Advanced
0

This article provides clear guidance on how to choose the sharding strategy for an Amazon OpenSearch Service cluster with 3 data nodes. The purpose of this article is to explain, in simple and practical terms, the key factors to consider especially index size, workload type, and high availability.

Sharding Strategy Guide for a 3-Data-Node OpenSearch Cluster

Choosing the right sharding strategy is critical for performance, stability, and fault tolerance in Amazon OpenSearch Service. Here's a comprehensive guide to help you make the right decision for your 3-data-node cluster.

Understanding Sharding Notation

First, let's clarify the notation to avoid confusion:

StrategyPrimary ShardsReplicas per PrimaryTotal ShardsCalculation
1:11121 primary + 1 replica = 2
3:13163 primary + 3 replica = 6
6:161126 primary + 6 replica = 12

Important: The notation "3:1" means 3 primary shards with 1 replica each, resulting in 6 total shards (not 9). Each primary shard gets exactly one replica copy.

Recommended Sharding Strategy for a 3 Data Nodes

A general guideline is to try to keep shard size between 10–30 GiB for workloads where search latency is a key performance objective, and 30–50 GiB for write-heavy workloads such as log analytics.

Index SizeRecommended StrategyTotal ShardsUse Case
<10 GB1:12Test environments or very small production indexes
10-150 GB3:16Most production workloads (recommended)
>150 GB6:112Large indexes with high search/query load

Key Factors to Consider

1. Fault Tolerance (Most Critical)

With only three data nodes, you must have at least one replica for production workloads.

  • Why? If one node fails without replicas, your cluster goes red (data loss).
  • With one replica: If one node fails, the cluster stays yellow (fully functional, degraded redundancy).
  • Golden Rule: Never use zero replicas in production on a three-node cluster.
  • Recommendation: Start with a minimum of 3:1 to ensure every shard has a backup copy.

2. Target Shard Size

AWS and OpenSearch best practices recommend keeping primary shard sizes between 20-50 GB (maximum ~50 GB).

Formula to Calculate Primary Shards:

Number of primary shards ≈ Total index size ÷ Target shard size

Examples:

Total Index SizeTarget Shard SizeCalculated Primary ShardsRecommended Strategy
5 GB20-50 GB11:1 (2 total shards)
60 GB30 GB2= 3:1 (6 total shards)
120 GB= 40 GB= 3= 3:1 (6 total shards)
240 G= 40 GB= 6= 6:1 (12 total shards)

Why shard size matters:

  • Too large (>50 GB): Slow recovery, rebalancing, and search performance issues.
  • Too small (<10 GB): Excessive overhead, increased memory usage, larger cluster state.

3. Workload Type

Workload CharacteristicRecommended ApproachReasoning
Search-heavy / High QPSLean toward 6:1More shards = better query parallelism
Write-heavy / LoggingPrefer 3:1Fewer shards = less indexing overhead
Balanced read/writeStart with 3:1Safe, efficient starting point

4. Even Distribution & Cluster Health

  • Best Practice: Number of primary shards should be a multiple of your data nodes (e.g., 3) to ensure even distribution
    • Good: 3, 6, 9, 12 primary shards
    • Avoid: 4, 5, 7 primary shards (creates uneven distribution)
  • Shard Limits: Keep total shards per node under 1,000-1,500 for stability
    • Rule of thumb: ~1.5 vCPU per active shard

5. Future Growth Considerations

If you expect your index to grow significantly:

  • Start with 3:1 even if current data is small (<15 GB)
  • Avoids costly reindexing later as data grows
  • Provides headroom for growth without immediate reindexing

6. Monitoring and Adjusting

After implementing your sharding strategy, monitor these metrics:

  • Shard size: Use GET _cat/shards?v to check actual shard sizes
  • Search latency: High latency may indicate need for more shards (move to 6:1)
  • Indexing throughput: Low throughput may indicate too many shards (consider 3:1)
  • Cluster health: Yellow/red status requires immediate attention
  • Node CPU/memory: High utilization may indicate over-sharding

Reference links

AWS
EXPERT
published a month ago136 views