- Newest
- Most votes
- Most comments
Thank you for providing details about your Neptune full-text search integration with OpenSearch Serverless. Let me address your concerns and provide some insights.
Regarding the 429 error you're encountering, this is typically related to throttling issues. The circuit breaking exception suggests that the OpenSearch Serverless cluster is reaching its capacity limits during the data transfer process.
The shard count discrepancy you've noticed (2 instead of 5) is likely due to OpenSearch Serverless managing sharding automatically. Unlike with managed OpenSearch Service, Serverless doesn't allow manual configuration of shard counts. The system determines the optimal number of shards based on the data volume and workload.
To address the ingestion issues:
-
Increase the OCU (OpenSearch Compute Unit) limit: While you mentioned the limit was set to 10 by default, you might want to consider increasing this further if your data volume is substantial.
-
Adjust batch size: The Lambda function is using a batch size of 2500. You could try reducing this to see if it helps with the throttling issues.
-
Gradual data transfer: Instead of transferring all data at once, consider breaking it into smaller chunks and transferring them incrementally.
-
Retry mechanism: Implement a retry mechanism with exponential backoff in your Lambda function to handle temporary throttling issues.
Regarding increasing the speed of ingestion, unfortunately, the number of workers used by the Lambda function is not directly configurable. However, you can try the following:
-
Increase Lambda function memory: This can indirectly increase CPU allocation and potentially improve processing speed.
-
Use multiple Lambda functions: Instead of relying on a single function, you could split the workload across multiple Lambda functions, each responsible for a subset of your data.
-
Optimize your graph data: Ensure your Neptune database is optimized, as this can impact the speed of data extraction.
-
Monitor and adjust: Keep a close eye on CloudWatch metrics for both Neptune and OpenSearch Serverless to identify bottlenecks and adjust your approach accordingly.
Remember that OpenSearch Serverless is designed to handle scaling automatically, so some of the manual optimizations available in the managed service may not be applicable. Focus on working within the Serverless model's constraints and leveraging its auto-scaling capabilities.
Lastly, if these suggestions don't resolve your issues, consider reaching out to AWS support for more personalized assistance, as they can provide insights based on your specific account and usage patterns.
Sources
Amazon Neptune now supports for Amazon OpenSearch 2.3, 2.5, and Serverless
Full text search in Amazon Neptune using Amazon OpenSearch Service - Amazon Neptune
Hello,
To answer your question, we would require details that are non-public information. Please open a support case with AWS using the following link:
https://console.aws.amazon.com/support/home#/case/create
If a support case has already been created please be assured that we will get back to you and assist you in the best way possible.
Thank you!
Relevant content
- asked 3 years ago
- asked 7 months ago
- Accepted Answerasked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 5 months ago
I have modified the OCU value for indexing to 14. The Lambda batch size was set to 5000 (don't know why it publishes the results in 2500 batch size) and I reduced it to 4000. As for the retry mechanism, I added now something like that in the Step function defined by the CF stack. Hopefully, I configured it correctly and it will hold. Now waiting for the pipeline to finish one way or another.
There is however something else that's been puzzling me since this morning related to the API https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-genref.html
I get a 403 Forbidden when trying to perform a PUT request. A GET works just fine.
params = '{"index.requests.cache.enable": false}' request = AWSRequest(method="PUT", url=f"https://{FTS_ENDPOINT}/amazon_neptune2/_settings", data=params, headers=headers)
SigV4Auth(boto3.Session().get_credentials(), "aoss", "us-west-2").add_auth(request) session = URLLib3Session() r = session.send(request.prepare())