Skip to content

Benchmarking Instance Types for Amazon OpenSearch Workloads

12 minute read
Content level: Expert
2

A detailed performance analysis between Amazon OpenSearch's specialized OM2 and general-purpose M7g instances to help you optimize performance and cost.

Choosing the optimal instance type for Amazon OpenSearch clusters is crucial for balancing performance and cost. With AWS offering both the OpenSearch-specialized OM2 instances and the newer general-purpose M7g instances, organizations face an important decision.

While OM2 instances are tailored for OpenSearch with high memory-to-vCPU ratios, M7g instances bring the latest technology, promising enhanced overall performance. The best choice depends on your specific workload characteristics and requirements.

This article presents comprehensive benchmark comparisons between these instance types, providing DevOps teams and architects with actionable insights for making informed infrastructure decisions. We'll examine real-world performance metrics and cost implications to help you optimize your OpenSearch deployment.

Understanding Benchmark Testing in OpenSearch Optimization

Benchmark testing in OpenSearch is a systematic process of evaluating cluster performance under controlled conditions, measuring key metrics like query latency, throughput, and resource utilization. For distributed search engines like OpenSearch, benchmarking goes beyond simple performance testing — it's about understanding how your cluster behaves under specific workload patterns. It provides quantitative data for making informed decisions about infrastructure, configuration, and scaling strategies.  By simulating real-world workloads and measuring system behavior under controlled conditions, teams can optimize their OpenSearch deployments effectively. The four essential pillars of OpenSearch benchmark testing are as follows:

  1. Performance optimization: Focuses on measuring and improving query response times, throughput, and overall cluster efficiency. This helps teams validate configuration changes and understand the impact of different workload patterns.
  2. Capacity planning: Enables teams to make data-driven decisions about cluster sizing, shard allocation, and scaling strategies. It helps predict resource requirements for future growth and ensures reliable performance during peak loads.
  3. Cost management: Provides insights into resource utilization and helps optimize infrastructure spending. By understanding performance per dollar metrics, teams can make informed decisions about instance types and cluster configurations.
  4. Bottleneck identification: Helps pinpoint performance constraints across CPU, memory, network, and storage. Early identification of bottlenecks allows teams to address issues before they impact production workloads.

Understanding these pillars is crucial for conducting meaningful benchmark tests that drive improvements in your OpenSearch deployment.

Benchmark Setup and Methodology

OpenSearch Benchmark, a tool provided by the OpenSearch Project, comprehensively gathers performance metrics from OpenSearch clusters, including indexing throughput and search latency. Whether you’re tracking overall cluster performance, informing upgrade decisions, or assessing the impact of workflow changes, this utility proves invaluable.

We compare the performance of two clusters: one powered by OpenSearch-specialized OM2 instances and the newer general-purpose M7g instances. The dataset comprises HTTP server logs from the 1998 World Cup website and is commonly used for ingestion-heavy and search-intensive scenarios, making it ideal for comparing instance performance in such tasks. With the OpenSearch Benchmark tool, we conduct experiments to assess various performance metrics, such as indexing throughput, search latency, and overall cluster efficiency. Our aim is to determine the most suitable configuration for our specific workload requirements.

You can install OpenSearch Benchmark directly on a host running Linux or macOS, or you can run OpenSearch Benchmark in a Docker container on any compatible host. OpenSearch Benchmark includes a set of workloads that you can use to benchmark your cluster performance. Workloads contain descriptions of one or more benchmarking scenarios that use a specific document corpus to perform a benchmark against your cluster. The document corpus contains indexes, data files, and operations invoked when the workflow runs.

When assessing your cluster’s performance, it is recommended to use a workload similar to your cluster’s use cases, which can save you time and effort. Consider the following criteria to determine the best workload for benchmarking your cluster:

  • Use case: Selecting a workload that mirrors your cluster’s real-world use case is essential for accurate benchmarking. By simulating heavy search or indexing tasks typical for your cluster, you can pinpoint performance issues and optimize settings effectively. This approach makes sure benchmarking results closely match actual performance expectations, leading to more reliable optimization decisions tailored to your specific workload needs.
  • Data: Use a data structure similar to that of your production workloads. OpenSearch Benchmark provides examples of documents within each workload to understand the mapping and compare with your own data mapping and structure. Every benchmark workload is composed of the following directories and files for you to compare data types and index mappings.
  • Query types: Understanding your query pattern is crucial for detecting the most frequent search query types within your cluster. Employing a similar query pattern for your benchmarking experiments is essential.

The OpenSearch Benchmarking Process follows a systematic workflow consisting of the following five key steps: 

  1.  Environment Setup: Configure a testing environment that closely mirrors your production setup. Ensure hardware meets minimum requirements (e.g., CPU, RAM, SSD storage) and set up an OpenSearch cluster or domain for benchmarking. When you select an instance, you should also think about which workloads you want to run. As a general rule, make sure that the OpenSearch Benchmark host has enough free storage space to store the compressed data and the fully decompressed data corpus once OpenSearch Benchmark is installed.

    • Hardware requirements

      • CPU: 8+ cores recommended
      • RAM: 16GB minimum, 32GB+ recommended
      • Storage: SSD with at least 3x the size of your test dataset – 500GB
    • Software requirements

      • Python 3.8 or later. python3 --version
      • Pip installed. pip --version
      • Git 1.9 or later. git --version
    • Installing on Linux

      • After the required software is installed, install the OpenSearch Benchmark using the following command: pip install opensearch-benchmark
      • Verify the installation using the command below: opensearch-benchmark -h
      • Refer to the documentation for installing the OpenSearch Benchmark with Docker

    2. Select and Configure Workload: Choose a workload that matches your use case (e.g., http_logs, geonames). Workloads define datasets, queries, and operations to simulate real-world scenarios. Customize workload parameters if needed, such as target throughput or concurrency.

Workload NameDocument CountCompressed SizeUncompresses Size
http_logs247,249,0961.2 GB31.1 GB

    To see a list of default benchmark workloads, visit the opensearch-benchmark-workloads repository.

  1. Data Ingestion:  Load the workload dataset into the target OpenSearch cluster. This step prepares the index and ensures the data is ready for benchmarking operations.
  2. Run Benchmark Tests: Execute benchmark tests using OpenSearch Benchmark. Tests simulate operations like indexing, querying, and aggregations while collecting metrics such as latency, throughput, and system resource usage.

This example runs a benchmark with http_logs workload and a disabled certificate verification:

opensearch-benchmark execute-test \
--target-hosts=https://opensearch-cluster-dns-name:9200 \
--pipeline=benchmark-only \
--workload=http_logs \
--client-options=basic_auth_user:*****,basic_auth_password:******,certs:false
  1. Analyze Results: Review collected metrics to evaluate cluster performance. Use insights to identify bottlenecks, optimize configurations, or compare different setups for improvements. The OpenSearch Benchmark summary report provides metrics related to the performance of your cluster; how you compare and use those metrics depends on your use case. 

OpenSearch Benchmark results are stored in-memory or in external storage, and results can be found in the /.benchmark/benchmarks/test_executions/<test_execution_id> directory. Results are named in accordance with the test_execution_id of the most recent workload test.

Performance Benchmark Analysis: OM2 vs M7g for Amazon OpenSearch

In this article, we conducted a performance comparison between two different configurations of OpenSearch Service:

  • Configuration 1 – Cluster manager nodes and two data nodes of OpenSearch-specialized OM2 instances
  • Configuration 2 – Cluster manager nodes and two data nodes of the newer general-purpose M7g instances

In both configurations, we use the same number and type of cluster manager nodes: three c6g.xlarge. You can set up different configurations with the supported instance types in OpenSearch Service to run performance benchmarks.

The following table summarizes our OpenSearch Service configuration details:

ComponentOM2 ClusterM7g Cluster
CLUSTER MANAGER NODES
Instance Typec6g.largec6g.large
Count33
DATA NODES
Instance TypeOM2.2xlargeM7g.2xlarge
Count22
vCPUs per node88
Memory per node32 GiB32 GiB
Storage Configuration**
Volume Typegp3gp3
Size500 GB500 GB
IOPS30003000
OPENSEARCH CONFIGURATION
Version2.192.19
Shards per index55
Replicas11
JVM Heap8GB8GB
MONITORING
CloudWatch MetricsEnabledEnabled
Metric Frequency1 minute1 minute

Now let’s examine the performance details between the two configurations.

Performance Benchmark Comparison

The http_logs dataset contains HTTP server logs from the 1998 World Cup website between April 30, 1998, and July 26, 1998. Each request consists of a timestamp field, client ID, object ID, size of the request, method, status, and more. The uncompressed size of the dataset is 31.1 GB with 247 million JSON documents. The amount of load sent to both domain configurations is identical. The following table displays the amount of time taken to run various aspects of an OpenSearch workload on our two configurations.

Here's the comprehensive comparison with use cases/scenarios:

Metric TypeMetricDescriptionUse CaseM7gOM2% ChangeWinner
Indexing Performance       
Indexing TimePrimary shardsTotal time for document indexing across primary shardsLog ingestion, Document processing87.03 min65.68 min-24.54%OM2 ✅
Flush TimePrimary shardsTime to persist indexed data to diskLarge batch updates, Data migrations8.57 min5.06 min-41.03%OM2 ✅
GC TimeYoung GenGarbage collection time for recent objectsMemory-intensive operations16.50 sec7.29 sec-55.83%OM2 ✅
Query Performance       
Bulk Indexp99 latencyTime for 99% of bulk index operationsETL processes, Data imports300.02 ms773.71 ms+157.87%M7g ✅
Query ThroughputMeanQueries processed per secondHigh-traffic search applications16.33 ops/s0.025 ops/s-99.85%M7g ✅
Match Allp99 latencyResponse time for full index scansSystem health checks, Analytics34.25 ms31.87 ms-6.95%OM2 ✅
Term Queryp99 latencyExact match query response timeProduct catalog search, User lookups35.14 ms29.32 ms-16.56%OM2 ✅
Range Queryp99 latencyRange-based query response timeTime-series data, Price filters50.66 ms33.46 ms-33.95%OM2 ✅
Hourly Aggregationp99 latencyHourly data grouping response timeMetrics dashboards, Usage reports72.77 ms49.46 ms-32.02%OM2 ✅
Multi-term Aggregationp99 latencyComplex aggregation response timeBusiness analytics, Complex reporting2468.37 ms2200.92 ms-10.83%OM2 ✅

The performance comparison between M7g and OM2 instances reveals distinct strengths for different use cases. OM2 excels in complex query operations with better latency for range queries, aggregations, and term searches, plus superior memory management. M7g, however, shows stronger performance in bulk operations and throughput-intensive tasks.  This suggests using OM2 for production environments requiring consistent low-latency query performance, while M7g might be more suitable for development environments, batch processing, and cost-sensitive workloads where raw throughput is prioritized over query complexity.

Conclusion

In conclusion, our benchmarking analysis of OM2 and M7g instances in OpenSearch clusters reveals clear performance patterns to guide infrastructure decisions. OM2 instances demonstrate superior performance in complex query operations, memory management, and consistent low-latency responses, making them ideal for production environments with demanding search and analytics workloads. M7g instances excel in bulk operations and high-throughput scenarios, offering a cost-effective solution for development environments and batch processing tasks.  The significant performance variations across metrics emphasize the importance of aligning instance selection with specific workload requirements. Organizations should carefully evaluate their use cases, considering factors like query complexity, throughput needs, and cost constraints, to choose the most suitable instance type or consider a hybrid approach for optimal performance.


About the authors
Enter image description hereJatinder Singh is a Senior Technical Account Manager at AWS who specializes in helping customers with their cloud migration and innovation endeavors. He brings his expertise and enthusiasm for technology to help clients efficiently scale their businesses so that they can focus on their core activities. Outside of work, he enjoys spending moments with his family and indulging in hobbies such as reading, culinary arts, and chess.
Enter image description herePuneetha Kumara is a Senior Technical Account Manager at AWS, with over 15 years of industry experience, including roles in cloud architecture, systems engineering, and container orchestration.
Enter image description hereManpreet Kour , an experienced Senior Technical Account Manager at AWS, is dedicated to ensuring customer satisfaction. Her approach involves a deep understanding of customer objectives, aligning them with software capabilities, and effectively driving customer success. Outside of her professional endeavors, she enjoys traveling and spending quality time with her family.