- Newest
- Most votes
- Most comments
Hi,
This very recent blog post goes into full details about creating such a Triton backend: https://aws.amazon.com/blogs/machine-learning/host-ml-models-on-amazon-sagemaker-using-triton-python-backend/
Best,
Didier
Thanks Didier, but as far as I can see that post doesn't cover building a triton_python_backend_stub
if you're not using the same version of python - it just covers how to use conda-pack
to add dependencies but assumes you create a conda 3.8 environment.
I have managed to build a python 3.8 triton_python_backend_stub for the official nvidia triton image so am I now at least able to test my ensemble locally before deploying to AWS, and that works fine - but I'm now running into different issues.
Using the exact same repository that works fine locally (but without the stubs) the sagemaker container fails to load the exact same python environment that works perfectly using the nvcr.io/nvidia/tritonserver:23.07-py3
fails with 355873309152.dkr.ecr.ap-southeast-2.amazonaws.com/sagemaker-tritonserver:23.05-py3
When running in the offical nvidia image the logs contain
I0818 06:23:29.847878 100 python_be.cc:1746] Using Python execution env /mnt/model-repository/tokenizer/py38-transformers.tar.gz
I0818 06:23:29.847903 100 python_be.cc:1746] Using Python execution env /mnt/model-repository/post_processor/py38-transformers.tar.gz
(I have two python models tokenizer
and post_processor
)
But when I run on AWS it's only creating one of the two environments - and then does seems to be using different envs for the python for the tokenizer model (/tmp/python_env_EmqFow/1/bin/activate
and /tmp/python_env_EmqFow/1/bin/activate
)
I0818 07:21:44.156862 89 pb_env.cc:271] Extracting Python execution env /opt/ml/model/post_processor/py38-transformers.tar.gz
I0818 07:24:49.740470 89 stub_launcher.cc:257] Starting Python backend stub: source /tmp/python_env_EmqFow/1/bin/activate && exec env LD_LIBRARY_PATH=/tmp/python_env_EmqFow/1/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /opt/ml/model/tokenizer/1/model.py triton_python_backend_shm_region_2 16777216 1048576 89 /opt/tritonserver/backends/python 336 tokenizer
I0818 07:24:49.740535 89 stub_launcher.cc:257] Starting Python backend stub: source /tmp/python_env_EmqFow/0/bin/activate && exec env LD_LIBRARY_PATH=/tmp/python_env_EmqFow/0/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /opt/ml/model/post_processor/1/model.py triton_python_backend_shm_region_3 16777216 1048576 89 /opt/tritonserver/backends/python 336 post_processor_0
I0818 07:21:44.347011 107 pb_stub.cc:255] Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'transformers'
I've searched the logs for "Extracting Python execution env" and it only appears once, despite the following appearing in both configs
parameters: {
key: "EXECUTION_ENV_PATH",
value: {string_value: "$$TRITON_MODEL_DIRECTORY/py38-transformers.tar.gz"}
}
I'm wondering if nvidia fixed an issue between 23.05 (sagemaker latest) and 23.07 (nvidia latest). I guess I need to run with 23.05 and check.
Relevant content
- asked 6 months ago
- Accepted Answerasked 8 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago