- Newest
- Most votes
- Most comments
Hi there,
The internal queue is essentially FIFO, requests that are sent to your endpoint are queued in the order they are received by the endpoint. If you only have one instance behind this endpoint then your requests are processed in that same order, if you have multiple instances then requests are distributed evenly amongst them. In this case not all requests may be processed at the same time as the time to perform inference will inevitably vary across all of these instances so there can be discrepancies in the order in which you receive your responses.
The internal queue of an Asynchronous Endpoint maintains a FIFO request ordering on a best effort basis only and we do not guarantee this to be case all the time.
If your requests are getting timed out then I suggest that you increase your InvocationTimeoutSeconds
(refer to the documentation here) to prevent this from happening, however if you have already set this parameter to its maximum value of 3600
seconds then you should consider adding an Autoscaling policy that monitors a metric such as ApproximateBacklogSizePerInstance
to scale up the total number of instances you have. Having more instances would mean your requests are processed much faster and would reduce the likelihood of timeout.
Relevant content
- asked a year ago
- asked 2 years ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 5 months ago