Skip to content

Error Deploying Model to Endpoint with initial_instance_count=2

0

When I attempt to deploy a model to an endpoint with initial_instance_count=1, it works fine. Attempting to deploy the same model to an endpoint with initial_instance_count=2, however, always results in the following error:

UnexpectedStatusException: Error hosting endpoint mm-xxxxxxxx-content-recommender-2024-09-27-18-32-34: Failed. Reason: error: Failed to download model data from URL "s3://sagemaker-us-west-2-xxxxxxxxxxxx/tensorflow-inference-2024-09-27-18-32-34-972/model.tar.gz". Please ensure that an S3 VPC endpoint exists in route table or NAT gateway for the VPC mode and the URL is reachable from within the subnets provided.. Try changing the instance type or reference the troubleshooting page https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-troubleshooting.html

The deployment code is as follows:

from sagemaker.tensorflow import TensorFlowModel
from sagemaker.tensorflow import TensorFlowPredictor
from time import gmtime, strftime

vpc_config = {
  "Subnets": ["subnet-abc123", "subnet-abc456"], # Need subnets in at least 2 distinct availability zones.
  "SecurityGroupIds": ["sg-abc123"]
}

mm = TensorFlowModel(
  entry_point="inference.py",
  source_dir="multi/code",
  model_data=f"s3://{bucket}/{prefix}/{multi_archive}",
  role=role,
  framework_version="2.16",
  container_log_level=logging.WARNING,
  vpc_config=vpc_config,
  env={ "SAGEMAKER_TFS_DEFAULT_MODEL_NAME": "ranking" })

timestamp_suffix = strftime('%Y-%m-%d-%H-%M-%S', gmtime())
endpoint_name = f"mm-xxxxxxxx-{timestamp_suffix}"

mm_predictor = mm.deploy(
  initial_instance_count=2,
  instance_type="ml.c5.9xlarge",
  endpoint_name=endpoint_name
  )

I tried setting model_data_download_timeout=1200, but it didn't help.

1 Answer
0

Hi,

First, did you check that the model that you use in properly stored in the S3 bucket where SageMaker looks for it, i.e s3://sagemaker-us-west-2-xxxxxxxxxxxx/tensorflow-inference-2024-09-27-18-32-34-972/model.tar.gz" ?

Second, are you sure that the Sagemaker notebook from which you run the deploy is allowed to access S3 ? See https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html to define yours properly.

Third, if your notebook is attached to a fully private VPC, you will also have to create an S3 VPC endpoint in your VPC to access the S3 bucket privately. See https://docs.aws.amazon.com/sagemaker/latest/dg/host-vpc.html for all details about creating such a VPC endpoint

Best,

Didier

EXPERT
answered a year ago
  • Thanks, Didier, but please note that deploying the model using the same code with initial_instance_count=1 works fine. So the model assets are indeed located in the right S3 bucket, and the notebook has access to that location. The page you shared about the VPC links to a page about VPCs and batch transforms (https://docs.aws.amazon.com/sagemaker/latest/dg/batch-vpc.html#batch-vpc-ip), which states, "Your VPC subnets should have at least two private IP addresses for each instance in a transform job." So I'm guessing the problem is that I've only defined enough private IP addresses for one instance.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.