We are having issues with the auto-scaling of our collection. We're trying to slowly shift traffic to the new collection but as soon as there is a little bit of traffic, the search latency spikes, requests start to accumulate and we basically can't do searches anymore because it overloads the collection.
I was able to trigger scaling of search OCUs by sending manual requests incrementally, up to ~2.5/s but this took about 1h to go from 0 to that, and it only scaled to 3 OCUs.
I'm concerned we won't be able to handle rapid traffic increases.
We exclusively use the msearch
API for our searches.
We had to add timeouts:
cancel_after_time_interval: '2s'
on all queries
maxRetries: 0
on msearch
requestTimeout: 2000
on msearch
This allowed us to be able to recover from those spikes that would clog our system, but this is a band-aid.
Our setup is CloudFront -> API Gateway -> Lambda -> OpenSearch Serverless collection through a VPC endpoint.
We're also having troubles finding resources on how to troubleshoot OpenSearch Serverless. Most are for provisioned domains, such as this one: https://repost.aws/knowledge-center/opensearch-latency-spikes
Thanks for exploring AOSS. Can you send an email to anapat@amazon.com with the details about your collection information and query you are trying to run? We can look into the issue you are facing.