I stumbled upon this error when invoking the streaming endpoint using boto3:
An error occurred (ModelStreamError) when calling the InvokeEndpointWithResponseStream operation: Your model primary did not complete sending the inference response in the allotted time.
Is there any way to increase the timeout period? I don't want to use async-invoke.
What is the reason your are using streaming? How long does your container take to respond to a request? The container requirements are documented here: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-code-container-response