Creating a Sagemaker Training Pipeline for Triton Model

0

I have an existing model training procedure that creates and registers a Model that uses the Triton Inference Container. The model is an ensemble with multiple steps.

Unlike model sagemaker models that have a paired Training and Inference container, there is only a Triton inference container. My current process for building this is

  1. Train a Huggingface model using the Huggingface container
  2. Construct a model-registry on the local filesystem, i.e.
model-registry
    /tokenizer
        config.pbtxt
        /1
    /encoder
        config.pbtxt
        /1
           model.py

(the config.pbtxt and model.pyfiles are stored in a git repository)

  1. Using ONNX to compile the trained model from step 1, and copy it to model-registry/encoder/1, and also copy the tokenizer from the same model to model-registry/tokenizer/1
  2. Run a bash script that creates a conda venv using the required python version, installs dependencies and uses conda pack to create an environment.tar.gz and copy to model-registry/tokenizer
  3. Compress model-registry to model.tar.gz
  4. Construct a Model from the model.tar.gz and triton container
  5. Register model in model registry

I'm in the process of migrating this to a Sagemker pipeline. Step 1 is trivial as is step 6. and 7.

I'm trying to work out a good approach for 2 - 5. There are two things that I'm unsure of.

First, the sagemaker.processing.Processor derivatives generally appear to have a single source dependency (i.e. a python script). But in my case, I have multiple files tracked in git - i.e. each model that makes up the ensemble has a tracked config.pbtxt file, and each python model has a model.py file. These are separate from the script(s) that perform the processing.

Second, step 4 is a shell script, I could if needs be create a python wrapper that calls subprocess.popen but that seems overkill, I'd prefer to be able to define a processing step that runs a bash script.

At this stage I suspect I'm going to have to create a customer docker container for steps 2 - 5 (https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html) that gets built and registered as part of the CI/CD process that creates the pipeline. Does this seem like the best approach or are there other options?

Dave
gefragt vor 9 Monaten358 Aufrufe
1 Antwort
1

Hello. From your description, I understand that you would like to know the best approach to prepare your trained model with Triton inference container image.

There are some example notebooks on this topic [1], and I mainly tested this one [2]. In short, the notebook uses a Docker container to compile the model.pt, and uses some commands to create the folder, move the model file in the folder, and create the tar file. What the notebook does is similar to your steps. You can simply run a few cells and the Model is ready to be deployed.

Besides, in SageMaker, "Pipeline" usually refers to Amazon SageMaker Model Building Pipelines [3]. I am not quite sure whether you really mean it. If yes, you can create a Pipeline, which has a step to train your model, and a following Processing step [4] or multiple Processing steps to accomplish your steps 2-5.

Regarding these steps 2-5, I think I can group them into 2 categories.

  • Step 3: I assume you are Python code like "transformers.onnx.export()" [5] to compile.
  • Step 2, 4, 5: These steps can be done with bash commands.

With Processing Job, you can bring your own containers. This doc [6] gives an example about a Dockerfile, which uses python3 as entrypoint. You can implement the below logic in the processing_script.py.

Download your trained model to local disk.
compile the model.
Upload the compiled model to a S3 bucket.

Apart from running a Python script as demoed in [6], you can also use the below Dockerfile to let Processing Job run a bash.

FROM python:3.7-slim-buster

# Add a bash script and configure Docker to run it
ADD run.sh /
ENTRYPOINT ["/bin/bash", "/run.sh"]

In the run.sh file, implement the below logic.

(Step 2) Create the folder structure.
Download your compiled model from s3 to the folder.
(Step 4) Create a conda venv, install dependencies, conda pack, and copy to the folder.
Create config.pbtxt, and copy to the folder.
(Step 5) Once all files are ready, compress, and upload the tar.gz to S3.

To use your own container image, you need to follow "Step 5: Push the Container to Amazon Elastic Container Registry (Amazon ECR)" in [7] to build your image, push to ECR. Make sure to modify the code accordingly. Then you can use below code to run the Processing Job.

from sagemaker.processing import Processor
processor = Processor(image_uri='<YOUR_ECR_IMAGE>',
                     role=role,
                     instance_count=1,
                     instance_type="<INSTANCE_TYPE>")
processor.run()

You can also make it a ProcessingStep as in [4].

In sum, in the Pipeline, you have the following steps:

A training step to train your model.
A processing step which runs Python code to compile your model.
A processing step which runs bash code to prepare the folder structure and zip.
Optional other steps.

Hope this information helps.

[1] https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-triton

[2] https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-triton/resnet50/triton_resnet50.ipynb

[3] https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html

[4] https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing

[5] https://aws.amazon.com/blogs/machine-learning/host-ml-models-on-amazon-sagemaker-using-triton-onnx-models/

[6] https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html

[7] https://docs.aws.amazon.com/sagemaker/latest/dg/prebuilt-containers-extend.html

Kai_Z
beantwortet vor 9 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen