- Newest
- Most votes
- Most comments
Hello,
Thank you for using AWS SageMaker.
It is difficult to identify why this behavior is observed without any logs for the mentioned task under your account. Looking at the above snippet shared, I was able to identify that the extending docker image used is based on GPU instance "pytorch-inference:1.10.2-gpu-py38-cu113-ubuntu20.04-sagemaker " and the Batch transform job that was created was using CPU instances i.e ("InstanceType": "ml.m5.large").
I'd recommend to fix that configuration and try running the batch transform job once again. If you still observe similar issue, I'd recommend you to reach out to AWS Support for further investigation of the issue along with all the details and logs as sharing logs is not recommended to share on this platform.
Open a support case with AWS using the link: https://console.aws.amazon.com/support/home?#/case/create
Thanks, I'll take a look at that and see if that makes a difference; although it seems not to be an issue with the inference so much as being unable to get the extended image to use my script over the default one.
I do also have a support case open already - just hoping to get some other views/support too in order to get the issue resolved as soon as possible.
Relevant content
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
Can you test by including the inference.py script in the model tar ball instead of baking it into the image?
I got the same problem now. In the cloudwatch log, it tried to use default_pytorch_inference_handler.py instead of my inference.py. Did you manage to solve the problem yet?