By using AWS re:Post, you agree to the Terms of Use

Questions tagged with Amazon OpenSearch Service

Sort by most recent
  • 1
  • 12 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Elasticsearch reindex - self-hosted to AWS - timing out, consistently failing

I have a set of what I presumed to be fairly small elasticsearch indices in a self-hosted cluster of ec2 instances. I'm in the middle of trying to migrate this date into an AWS-managed elasticesearch cluster, and I've been having trouble getting the reindex tasks to _consistently_ reproduce the documents in the target cluster. My cluster has about 330,000 documents in it - my approach has been to expose the source cluster to the destination cluster and to issue a reindex operation to the destination cluster, using some scripts written in ruby: ```ruby def reindex(index:) logger.info("reindexing #{index}") task = dest_es.reindex( body: { source: { remote: { host: source, username: source_username, password: source_password, socket_timeout: '5m', connect_timeout: '30s', external: true }, index: index }, dest: { index: index }, conflicts: "proceed" }, refresh: true, timeout: '3m', wait_for_completion: wait_for_completion? # set to false by default ) logger.info(task) end ``` This task, however, never reindexes 100% of the documents, and frequently times out with the task status stating: ```json { "completed" : true, "task" : { ... "type" : "transport", "action" : "indices:data/write/reindex", "status" : { "total" : 317554, "updated" : 0, "created" : 184000, "deleted" : 0, "batches" : 184, "version_conflicts" : 0, "noops" : 0, "retries" : { "bulk" : 0, "search" : 0 }, "throttled_millis" : 0, "requests_per_second" : -1.0, "throttled_until_millis" : 0 }, "description" : """ ... """, "start_time_in_millis" : 1660058838837, "running_time_in_nanos" : 312625819581, "cancellable" : true, "headers" : { } }, "error" : { "type" : "socket_timeout_exception", "reason" : null } } ``` If i try to break this reindex operation into multiple tasks, running asyncronously, i can issue a number of async tasks to reindex, say, 2000 documents at a tim. . By looking at `POST /.tasks/_search` it seems that they all complete without issue, but I still can't quite get to 100% - in fact the total number of docs varies between 60% and 99% Are there config settings that are generally used with reindexing across clusters? I feel like this should be a lot more straightforward than it's turning out to be. Thanks!
1
answers
0
votes
48
views
asked 9 days ago

Which Opensearch instance type to choose for a new webapplication with little data?

Amazon recommends running an opensearch domain that contains 3 regular nodes and 3 master nodes distributed accross 3 AZ zones. The lowest instance type that is still suitable for a production environment is the `t3 medium.search` instance type. I've run this setup for about 1.5 days, using `t3 small.search` instead of `medium`. When i looked at the bill afterwards i could see that running such an instance for merely 1.5 days already costs 9 dollars. That's way too expensive for me. According to the [amazon cost calculator](https://calculator.aws/#/addService/OpenSearchService) the monthly cost for this setup would have been well over 350 dollars. My web application will use the open search server only for serving autocomplete suggestions and finding documents whose coordinates reside within a certain geographical area. When the webapplicatoin is launched the open search server will start out with only 5 indexes containing only a small number of documents, no more then 200 mb in total. Of these only one index is used to preform geospatial queries on. I don't think i need a t3 medium instance for this. So my question is: With what kind of open search instance can i start out with? The setup needs to be economical because it will take a while before my web application starts making money. I was thinking about setting up a `t2 micro.search` domain service with 2 micro master instances and 2 micro worker instances. That would cost me about 50 dollars a month in total. Could this be a good setup to start with? If so then i would like to know how i can setup a domain that uses `t2 micro.search` instances. When i go to the domain creation page in my aws console i'm not able to select `t2 micro.search` from the instance type list. The smallest i can select is `t3 small.search` but thats already too expensive for me because i want to run nodes in atleast two availability zones. I could opt for running only one `t3 small.search` master node and one worker node, which would cost 50 dollars a month as well, but then the domain service is no longer highly available. If the availability zone it sits in crashes then i can't serve autocomplete suggestions anymore, nor can i return documents based on their coordinates. I'd love to hear your opinions on this. Thank you
0
answers
0
votes
32
views
asked 23 days ago
  • 1
  • 12 / page