OpenSearch Serverless search latency spikes blocking searches

1

We are having issues with the auto-scaling of our collection. We're trying to slowly shift traffic to the new collection but as soon as there is a little bit of traffic, the search latency spikes, requests start to accumulate and we basically can't do searches anymore because it overloads the collection.

I was able to trigger scaling of search OCUs by sending manual requests incrementally, up to ~2.5/s but this took about 1h to go from 0 to that, and it only scaled to 3 OCUs. I'm concerned we won't be able to handle rapid traffic increases.

We exclusively use the msearch API for our searches.

We had to add timeouts:

  • cancel_after_time_interval: '2s' on all queries
  • maxRetries: 0 on msearch
  • requestTimeout: 2000 on msearch

This allowed us to be able to recover from those spikes that would clog our system, but this is a band-aid.

Our setup is CloudFront -> API Gateway -> Lambda -> OpenSearch Serverless collection through a VPC endpoint.

We're also having troubles finding resources on how to troubleshoot OpenSearch Serverless. Most are for provisioned domains, such as this one: https://repost.aws/knowledge-center/opensearch-latency-spikes

Search monitor

  • Thanks for exploring AOSS. Can you send an email to anapat@amazon.com with the details about your collection information and query you are trying to run? We can look into the issue you are facing.

No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions