Can we connect to the instance (via ssh or other means) where a Triton Sagemaker endpoint is deployed?

0

I am deploying a Triton server endpoint on Sagemaker and I want to ssh into the instance where the endpoint is running for debugging purposes. I can't find a way to identify the instance (e.g. find the instance id) and connect to it. I saw this repo, https://github.com/aws-samples/sagemaker-ssh-helper#inference, but it seems it only works with estimator objects on Sagemaker. I was wondering if there is a way to do the same thing with Sagemaker client objects: botocore.client.SageMaker. I am using the Sagemaker client to create the endpoint with the Triton image on aws ECR.

1 Answer
1

That SSH Helper library/sample is likely still the approach you want to take - just may need some extra hacking since it's built around the high-level SageMaker Python SDK as you mentioned. The SMPySDK is open source and uses the same service APIs as boto3 does under the hood, so it's just a matter of figuring out how the SSH connectivity works and replicating it in your low-level API calls.

Per the SSH Helper Readme, to use the helper with an endpoint you need to:

  • Add the helper library as a dependency to the model, which equates to including it under a /code folder of your model.tar.gz (see re-pack decision here and repack_model implementation here in SMPySDK)
  • Edit your inference.py to import and set up the SSH helper library.

On the automated side, from the SSHModelWrapper source code it looks like the only modification the class makes to the Model object is to add some environment variables to the model definition:

env.update({'START_SSH': str(self.bootstrap_on_start).lower(),
                    'SSH_SSM_ROLE': self.ssm_iam_role,
                    'SSH_OWNER_TAG': user_id,
                    'SSH_LOG_TO_STDOUT': str(self.log_to_stdout).lower(),
                    'SSH_WAIT_TIME_SECONDS': f"{self.connection_wait_time_seconds}"})

...So if you're creating and deploying your model objects via boto3, I believe you should be able to get an equivalent setup by doing the same steps:

  • Ensuring your model.tar.gz contains the SSH helper library code and an appropriate inference.py under the code/ subfolder
  • Setting the Environment variables the SSH helper library expects when calling CreateModel.

The fastest way to debug/get this working may be to create a temporary model+endpoint using the SM Python SDK and then inspect the created model.tar.gz and use DescribeModel / DescribeEndpointConfig / DescribeEndpoint to fully understand what configuration you need to replicate. (To clear up a misconception I've heard in the past: Yes, you can import a pre-trained model.tar.gz bundle using SM Python SDK Model object... You don't need to start from an Estimator and run a training job from scratch)

If you're not using the ML framework containers (for example you're using a built-in SageMaker algorithm, or a JumpStart model, or a from-scratch custom container instead of the AWS-provided frameworks for e.g. PyTorch, TensorFlow, etc), then your serving stack might not support an inference.py script bundle in which case things are a bit more complicated: You'd need to bake the SSH library into the container image itself and edit the serving stack to make sure it gets initialized.

AWS
EXPERT
Alex_T
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions