SAGEMAKER - ResourceLimitExceeded

0

ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateEndpoint operation: The account-level service limit 'Memory size in MB per serverless endpoint' is 3072 MBs, with current utilization of 0 MBs and a request delta of 6144 MBs. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota.

I don't understand the path in quotas to follow to increase "MemorySizeInByte" for SageMaker servless deployment.

1 Answer
1
  1. Go to service limit request quota page
  2. For Limit Type, Search and Select SageMaker Endpoints
  3. Make sure that region is same as the region that is displayed on the top right corner of your amazon console.
  4. Provide a description and detail about error message
  5. Click Submit

You'll get a response within 24-48 hours from the time you log support case for limit increase.

Note: Without looking further into more details for this error message in your account, it'd be little hard to find what's exactly blocking this, closest service limit listed in drop down is Sagemaker Endpoints and so I'm suggesting here.

Also, if you have support plan where you can create support case, I'd suggest you to log a support case under Technical category, as they would be in better position to tell exactly what's going on and what are those limits which are blocking this.

Hope you find this information helpful.

Comment here if you have additional questions, happy to help.

Abhishek

profile pictureAWS
EXPERT
answered 9 months ago
  • transformers_version='4.12.3' pytorch_version='1.9.1' py_version='py38' region = SESSION.boto_region_name

    image_uri = sagemaker.image_uris.retrieve( framework='huggingface', base_framework_version=f'pytorch{pytorch_version}', region=region, version=transformers_version, py_version=py_version, instance_type='ml.m5.large', # No GPU support on serverless inference image_scope='inference' )

    huggingface_model = HuggingFaceModel( model_data=model_s3_uri, # path to your trained sagemaker model role=role, # iam role with permissions to create an Endpoint
    py_version="py38", # python version of the DLC
    image_uri=image_uri, # sagemaker container image uri env={ "MMS_MAX_RESPONSE_SIZE": "20000000", "SAGEMAKER_CONTAINER_LOG_LEVEL": "20", "SAGEMAKER_PROGRAM": "inference.py", "SAGEMAKER_REGION": "us-east-1", "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code" } )

    from sagemaker.serverless import ServerlessInferenceConfig serverless_config = ServerlessInferenceConfig( memory_size_in_mb=6019, max_concurrency=8, )

    WHERE THE ERROR OCCURS:

    predictor = huggingface_model.deploy( serverless_inference_config=serverless_config )

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions