- Newest
- Most votes
- Most comments
Hello,
In Amazon OpenSearch service index is made up of Lucene segments on which the data is stored. Lucene segments are immutable in nature due to which documents deletion adds a delete marker instead of actually modifying the segment.
For improved performance our overall goal is to have less number of segments in OpenSearch domain. OpenSearch Service automatically runs the merge API operation using the merge policy setting. During a merge, the smaller segments are merged into larger segments to maintain the index size. Documents marked for deletion are also expunged to free up additional disk space.
I understand your OpenSearch domain have large number of documents marked for deletion. You can reclaim your disk space immediately using following options:
- Use the force merge API along with the only_expunge_deletes parameter to clear up the deleted documents within an index
- You can delete an index instead of deleting individual documents. Deleting an index doesn't create any delete markers. Instead, the delete index API clears the index metadata, and disk space is immediately reclaimed.
If you consider using force merge for your use case, please be aware of the following when performing the force merge operation:
- Perform force merge on your cluster only when there is enough free storage space. This action is a resource-intensive operation.
- The force merge operation triggers an I/O intensive process and blocks all new requests to your cluster until the merge is complete.
- Only call the force merge operation against read-only indices, when no additional data is being written to the index. Calling force merge against a read/write index can cause very large segments to be produced (>5 GB per segment). When this happens, the automatic merge policy doesn't consider these very large segments for future merges until the segments contain mostly deleted documents. As a result, disk usage increases and search performance worsens.
You can consider using rebuild/reindex operation which should help with deleted documents as reindex operation indexes the data into destination index and once data is reindexed we can delete the previous index. As discussed previously deleting an index doesn't create any delete markers and disk space is immediately reclaimed.
In order to merge the segments explicitly the only option available is to run force merge API. If you want to prevent running force merge API you can let OpenSearch Service automatically run the merge API operation using the merge policy setting which will periodically merge smaller segments into larger segments.
[+] https://repost.aws/knowledge-center/opensearch-deleted-documents
when using /_forcemerge?only_expunge_deletes=true
on a read only index we get an error:
{ "error" : { "root_cause" : [ { "type" : "cluster_block_exception", "reason" : "index [test-2023-1] blocked by: [FORBIDDEN/5/index read-only (api)];" } ], "type" : "cluster_block_exception", "reason" : "index [test-2023-1] blocked by: [FORBIDDEN/5/index read-only (api)];" }, "status" : 403 }
Relevant content
- asked a year ago
- asked a year ago
- asked 9 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated 3 years ago