Elasticsearch reindex - self-hosted to AWS - timing out, consistently failing

0

I have a set of what I presumed to be fairly small elasticsearch indices in a self-hosted cluster of ec2 instances. I'm in the middle of trying to migrate this date into an AWS-managed elasticesearch cluster, and I've been having trouble getting the reindex tasks to consistently reproduce the documents in the target cluster.

My cluster has about 330,000 documents in it - my approach has been to expose the source cluster to the destination cluster and to issue a reindex operation to the destination cluster, using some scripts written in ruby:

  def reindex(index:)
    logger.info("reindexing #{index}")

    task = dest_es.reindex(
      body: {
        source: {
          remote: {
            host: source,
            username: source_username,
            password: source_password,
            socket_timeout: '5m',
            connect_timeout: '30s',
            external: true
          },
          index: index
        },
        dest: { index: index },
        conflicts: "proceed"
      },
      refresh: true,
      timeout: '3m',
      wait_for_completion: wait_for_completion? # set to false by default
    )

    logger.info(task)
  end

This task, however, never reindexes 100% of the documents, and frequently times out with the task status stating:

{
  "completed" : true,
  "task" : {
...
    "type" : "transport",
    "action" : "indices:data/write/reindex",
    "status" : {
      "total" : 317554,
      "updated" : 0,
      "created" : 184000,
      "deleted" : 0,
      "batches" : 184,
      "version_conflicts" : 0,
      "noops" : 0,
      "retries" : {
        "bulk" : 0,
        "search" : 0
      },
      "throttled_millis" : 0,
      "requests_per_second" : -1.0,
      "throttled_until_millis" : 0
    },
    "description" : """
...
""",
    "start_time_in_millis" : 1660058838837,
    "running_time_in_nanos" : 312625819581,
    "cancellable" : true,
    "headers" : { }
  },
  "error" : {
    "type" : "socket_timeout_exception",
    "reason" : null
  }
}

If i try to break this reindex operation into multiple tasks, running asyncronously, i can issue a number of async tasks to reindex, say, 2000 documents at a tim. . By looking at POST /.tasks/_search it seems that they all complete without issue, but I still can't quite get to 100% - in fact the total number of docs varies between 60% and 99%

Are there config settings that are generally used with reindexing across clusters? I feel like this should be a lot more straightforward than it's turning out to be.

Thanks!

asked 2 years ago420 views
1 Answer
0

Thank you for reaching out to us. Please find below ways to improve reindexing performance:

  1. Reindexing without replicas is a valid strategy. Reducing your replica shards to 0 momentarily will improve performance and reduce the time to reindex. See the section titled "Change the replica count to zero" in the article referenced[1]. This article also contains different options for optimization worth looking into. 2. Using slicing in OpenSearch reindex operation is also valid. However, I do not recommend disabling replicas while using this strategy, as it will affect the availability of the shards being copied, and increase time. The OpenSearch documentation did not provide much information, but I found documentation from a third party source for ElasticSearch that confirms the slices correlated to the number of shards to copy[2]. The OpenSearch documentation you had linked also contained this article that confirmed slices were how many sub-tasks are involved with the reindex operation, and that it can be set to 'auto' to let OpenSearch decide[3].

    1. It seems that by default, the throttling is set to -1 in OpenSearch when reindexing[3]. Which means no throttling.

I hope I have answered your questions to your satisfaction. If you have any further queries or requests, please feel free to reach out and ask.

[1]How can I improve the indexing performance on my Amazon OpenSearch Service cluster? https://aws.amazon.com/premiumsupport/knowledge-center/opensearch-indexing-performance/

[2]Reindex API https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

[3]Index document https://opensearch.org/docs/latest/opensearch/rest-api/document-apis/reindex/

However, for complete guidance, I request you to please open a support case AWS and we would help you out further.

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions