SageMaker Inference Endpoint timeout value?

0

Hello,

I am trying to deploy this github solution but do not have access to a ml.g5.12xlarge and I am hoping to be able to run it on a ml.g5.4xlarge. Based on the error I am getting (see below dump showing timeout = 60s) I am wondering if there is some sort of timeout variable that I can set when I create the the sagemaker endpoint to increase how long it waits for the model to respond to the query.

PS: I am pretty sure the issue is not related to this post here

Thank you,

---------------------------------------------------------------------------
TimeoutError                              Traceback (most recent call last)
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/urllib3/connectionpool.py:467, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    463         except BaseException as e:
    464             # Remove the TypeError from the exception chain in
    465             # Python 3 (including for exceptions like SystemExit).
    466             # Otherwise it looks like a bug in the code.
--> 467             six.raise_from(e, None)
    468 except (SocketTimeout, BaseSSLError, SocketError) as e:


TimeoutError: The read operation timed out

During handling of the above exception, another exception occurred:

ReadTimeoutError                          Traceback (most recent call last)
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/botocore/httpsession.py:464, in URLLib3Session.send(self, request)
    463 request_target = self._get_request_target(request.url, proxy_url)
--> 464 urllib_response = conn.urlopen(
    465     method=request.method,
    466     url=request_target,
    467     body=request.body,
    468     headers=request.headers,
    469     retries=Retry(False),
    470     assert_same_host=False,
    471     preload_content=False,
    472     decode_content=False,
    473     chunked=self._chunked(request.headers),
    474 )
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/urllib3/connectionpool.py:358, in HTTPConnectionPool._raise_timeout(self, err, url, timeout_value)
    357 if isinstance(err, SocketTimeout):
--> 358     raise ReadTimeoutError(
    359         self, url, "Read timed out. (read timeout=%s)" % timeout_value
    360     )
    362 # See the above comment about EAGAIN in Python 3. In Python 2 we have
    363 # to specifically catch it and throw the timeout error

**ReadTimeoutError: AWSHTTPSConnectionPool(host='runtime.sagemaker.us-east-1.amazonaws.com', port=443): 
Read timed out. (read timeout=60)**

During handling of the above exception, another exception occurred:

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/botocore/httpsession.py:501, in URLLib3Session.send(self, request)
    500 except URLLib3ReadTimeoutError as e:
--> 501     raise ReadTimeoutError(endpoint_url=request.url, error=e)
    502 except ProtocolError as e:

ReadTimeoutError: Read timeout on endpoint URL: "https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/aws-genai-mda-blog-flan-t5-xxl-endpoint-6ecf4020/invocations"

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[14], line 20
      7 query = "How many covid cases are there in the state of NY"
**---> 20 response = run_query(query)**
     21 print("----------------------------------------------------------------------")
     22 print(f'SQL and response from user query {query}  \n  {response}')

Cell In[13], line 51, in run_query(query)
**---> 51 channel, db = identify_channel(query) **
AWS
gefragt vor 2 Monaten1035 Aufrufe
1 Antwort
0

Hi,

Please, checkout this page for available timeout parameters to configure: https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-hosting.html

Also, you may want to adapt the values of env vars SAGEMAKER_TS_RESPONSE_TIMEOUTand SAGEMAKER_MODEL_SERVER_TIMEOUT inside your container model instance (if they exist for this specific model) to increase the timeout value.

Best,

Didier

profile pictureAWS
EXPERTE
beantwortet vor 2 Monaten
  • Thank your for the insights. I added values to the timeout variables you mentioned, but still got similar errors. I also tried a couple of different huggingface models and deep learning containers (DLC) that are supposed to be faster performing but it seemed to make very little difference. Do you think using a smaller model flan-t5-xl is worth trying out?

    Just to summarize the models used and errors I am getting: Model: Flan-T5-XXL DLC: pytorch-inference:1.12.0-gpu-py38
    Error: ReadTimeoutError: AWSHTTPSConnectionPool(host='runtime.sagemaker.us-east-1.amazonaws.com', port=443): Read timed out. (read timeout=60)**

    Model: Flan-T5-XXL-FP16 DLC: huggingface-pytorch-inference:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04 Error raised by inference endpoint: An error occurred (ModelError) when calling the InvokeEndpoint operation: "InternalServerException", {"message": "addmm_impl_cpu_" not implemented for Half"}

    Model: Flan-T5-XXL-BNB-INT8 DLC: pytorch-inference:1.12.0-gpu-py38 Error: ReadTimeoutError: AWSHTTPSConnectionPool(host='runtime.sagemaker.us-east-1.amazonaws.com', port=443): Read timed out. (read timeout=60)**

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen