Force merge or rebuild index in removing Delete Document in AWS Opensearch

1

Hi Everyone, We have Opensearch 1.3 Domain, that on monthly bases we create a new index. All indexes that are up to one year old receive both insert and update. Because of that we have large number of Deleted documents in each index(between 40 - 50 millions ). Now we are looking for some method's to remove those deleted documents and improve the performance. It looks there are two below methods : 1- Force merge(which will remove deleted documents and merge the segments) 2- Rebuild index with different name, but with the same alias. This method does not merge segments.

So the question is, which method should be used? In some blogs, I read that merge will block the new requests against Cluster? Is this true? If I use the rebuild index , how can I merge the segments? Do we need to merge the segments or deleting deleted documents is enough for performance improvement?

2 Answers
1

Hello,

In Amazon OpenSearch service index is made up of Lucene segments on which the data is stored. Lucene segments are immutable in nature due to which documents deletion adds a delete marker instead of actually modifying the segment.

For improved performance our overall goal is to have less number of segments in OpenSearch domain. OpenSearch Service automatically runs the merge API operation using the merge policy setting. During a merge, the smaller segments are merged into larger segments to maintain the index size. Documents marked for deletion are also expunged to free up additional disk space.

I understand your OpenSearch domain have large number of documents marked for deletion. You can reclaim your disk space immediately using following options:

  1. Use the force merge API along with the only_expunge_deletes parameter to clear up the deleted documents within an index
  2. You can delete an index instead of deleting individual documents. Deleting an index doesn't create any delete markers. Instead, the delete index API clears the index metadata, and disk space is immediately reclaimed.

If you consider using force merge for your use case, please be aware of the following when performing the force merge operation:

  • Perform force merge on your cluster only when there is enough free storage space. This action is a resource-intensive operation.
  • The force merge operation triggers an I/O intensive process and blocks all new requests to your cluster until the merge is complete.
  • Only call the force merge operation against read-only indices, when no additional data is being written to the index. Calling force merge against a read/write index can cause very large segments to be produced (>5 GB per segment). When this happens, the automatic merge policy doesn't consider these very large segments for future merges until the segments contain mostly deleted documents. As a result, disk usage increases and search performance worsens.

You can consider using rebuild/reindex operation which should help with deleted documents as reindex operation indexes the data into destination index and once data is reindexed we can delete the previous index. As discussed previously deleting an index doesn't create any delete markers and disk space is immediately reclaimed.

In order to merge the segments explicitly the only option available is to run force merge API. If you want to prevent running force merge API you can let OpenSearch Service automatically run the merge API operation using the merge policy setting which will periodically merge smaller segments into larger segments.

[+] https://repost.aws/knowledge-center/opensearch-deleted-documents

AWS
SUPPORT ENGINEER
Rajat_C
answered a year ago
0

when using /_forcemerge?only_expunge_deletes=true on a read only index we get an error:

{ "error" : { "root_cause" : [ { "type" : "cluster_block_exception", "reason" : "index [test-2023-1] blocked by: [FORBIDDEN/5/index read-only (api)];" } ], "type" : "cluster_block_exception", "reason" : "index [test-2023-1] blocked by: [FORBIDDEN/5/index read-only (api)];" }, "status" : 403 }

Willian
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions