How can I improve the indexing performance on my OpenSearch Service cluster?

5 minute read
0

I want to optimize indexing operations in Amazon OpenSearch Service for maximum ingestion throughput.

Short description

To improve indexing performance on your OpenSearch Service cluster, take the following actions:

  • Distribute shards evenly across the data nodes for the index that you ingest into.
  • Increase the refresh interval to 60 seconds or more.
  • Change the replica count to zero.
  • Experiment to find the optimal bulk request size.
  • Use an instance type with SSD instance store volumes, such as I3.
  • Reduce the response size.
  • Increase the flush threshold size.

Resolution

Distribute shards evenly across the data nodes for the index that you ingest into

By default, OpenSearch Service distributes shards based on shard count, not shard size. Use the following formula to understand how OpenSearch Service distributes shards:

The number of shards per node = the number of shards for the index / the number of data nodes

For example, if there are 24 shards in the index and there are eight data nodes, then OpenSearch Service assigns three shards to each node.

To create an equivalent distribution, use both shard size and shard count. For more information, see Get started with Amazon OpenSearch Service: How many shards do I need?

Increase the refresh interval to 60 seconds or more

Refreshing your OpenSearch Service index allows search to find your documents. When you refresh the index, OpenSearch Service uses the same resources that were used to index threads.

The default refresh interval is one second. When you increase the refresh interval, the data node makes fewer API calls. To prevent 429 errors, it's a best practice to increase the refresh interval.

Note: The default refresh interval is one second for indices that receive one or more search requests in the last 30 seconds. For more information about the updated default interval, see Refresh API on the Elasticsearch website.

Change the replica count to zero

If you anticipate heavy indexing, then set the index.number_of_replicas value to 0. Each replica duplicates the indexing process. When you turn off the replicas, you improve your cluster performance. After the heavy indexing is complete, reactivate the replicated indices.

Important: If a node fails when replicas are off, then you might lose data. Turn off the replicas only if you can tolerate data loss for a short duration.

Experiment to find the optimal bulk request size

Start with a bulk request size of 5 MiB to 15 MiB. Then, slowly increase the request size until the indexing performance stops improving. For more information, see Using and sizing bulk requests on the Elasticsearch website.

Note: Some instance types limit bulk requests to 10 MiB. For more information, see Network quotas.

Use an instance type that has SSD instance store volumes

I3 instances provide fast and local memory express (NVMe) storage. I3 instances deliver better ingestion performance than instances that use General Purpose SSD (gp2) Amazon Elastic Block Store (Amazon EBS) volumes. For more information, see Petabyte scale in Amazon OpenSearch Service.

Reduce the response size

To reduce the size of your OpenSearch Service responses, use the filter_path parameter to exclude unnecessary fields.

Important: Don't filter out fields that you need when you identify or retry failed requests. Those fields can vary by client.

In the following example, the response excludes the index-name, type-name, and took fields:

curl -XPOST "es-endpoint/index-name/type-name/_bulk?pretty&filter_path=-took,-items.index._index,-items.index._type" -H 'Content-Type: application/json' -d'{ "index" : { "_index" : "test2", "_id" : "1" } }
{ "user" : "testuser" }
{ "update" : {"_id" : "1", "_index" : "test2"} }
{ "doc" : {"user" : "example"} }

For more information, see Reducing response size.

Increase the flush threshold size

By default, index.translog.flush_threshold_size is set to 512 MB. This means that OpenSearch Service flushes the translog when it reaches 512 MB. For more information, see Translog on the Elasticsearch website. The weight of the indexing load determines the frequency of the translog. When you increase index.translog.flush_threshold_size, the node performs the translog operation less frequently. Because OpenSearch Service flushes are resource-intensive operations, when you reduce the frequency of translogs, you improve indexing performance.

When you increase the flush threshold size, the OpenSearch Service cluster also creates fewer large segments instead of multiple small segments. Large segments merge less often, and use more threads for indexing instead of merging.

Note: An increase in index.translog.flush_threshold_size can also increase the time that it takes for a translog to complete. If a shard fails, then recovery takes more time because the translog is larger.

Before you increase index.translog.flush_threshold_size, get the current flush operation statistics. Run the following API operation:

curl -XPOST "os-endpoint/index-name/\_stats/flush?pretty"

Note: Replace os-endpoint with your OpenSearch Service endpoint and index-name with your index.

In the output, check the number of flushes and the total time. The following example output shows that there are 124 flushes that took 17,690 milliseconds:

{     "flush": {
          "total": 124,
          "total_time_in_millis": 17690
     }
}

To increase the flush threshold size, run the following API operation:

$ curl -XPUT "os-endpoint/index-name/_settings?pretty" -d "{"index":{"translog.flush_threshold_size" : "1024MB"}}"

Note: Replace os-endpoint with your OpenSearch Service endpoint and index-name with your index. Also, in the preceding operation, the flush threshold size is 1024 MB. It's a best practice to use this size for instances that have more than 32 GB of memory. Replace 1024 with the right threshold size for your OpenSearch Service domain.

To verify that the flush activity updates, run the _stats API operation:

$ curl _XGET "os-endpoint/index-name/_stats/flush?pretty"

Note: Replace os-endpoint with your OpenSearch Service endpoint and index-name with your index.

It's a best practice to increase the index.translog.flush_threshold_size for the current index only. After you confirm the outcome, apply the changes to the index template.

Related information

Operational best practices for Amazon OpenSearch Service

AWS OFFICIAL
AWS OFFICIALUpdated 8 months ago