Complete a 3 Question Survey and Earn a re:Post Badge
Help improve AWS Support Official channel in re:Post and share your experience - complete a quick three-question survey to earn a re:Post badge!
Inference Recommendation fails due to image size error
0
Hello AWS team!
I am trying to run a suite of inference recommendation jobs leveraging NVIDIA Triton Inference Server on a set of GPU instances (ml.g5.12xlarge, ml.g5.8xlarge, ml.g5.16xlarge) as well as AWS Inferentia machines (ml.inf2.2xlarge, ml.inf2.8xlarge, ml.inf2.24xlarge).
The following parameters customize each job:
SAGEMAKER_MODEL_SERVER_WORKERS = 1
OMP_NUM_THREADS =3
JobType = Default ( not Advanced )
A number of jobs is being spawned for each instance (as shown in the InferenceRecommender panel in SageMaker):
All fail with error: Image size 12399514599 is greater than supported size 10737418240
ml.inf2.24xlarge - 2 jobs
1 job fails with error: Image size 12399514599 is greater than supported size 10737418240
1 job fails with "Benchmark failed to finish within job duration"
ml.inf2.8xlarge - 3 jobs
2 jobs fail with error: Image size 12399514599 is greater than supported size 10737418240
1 job fails with "Benchmark failed to finish within job duration"
ml.g5.12xlarge - 4 jobs
3 jobs fail with error: Image size 12399514599 is greater than supported size 10737418240
1 job successfully completes!!
Since the models I am experimenting with consist of LLMs, their size combined with the associated image exceed the 10GB threshold discussed in this community question.
CloudWatch deep dive:
Looking into the logs associated with the Inferentia jobs, the following messages were repeatedly surfaced:
The NVIDIA Driver was not detected. GPU functionality will not be available. Use the NVIDIA Container Toolkit to start this container with GPU support
[Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35
Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
CUDA memory pool disabled
Failed to load '/opt/ml/model/::router' version 1: Invalid argument: instance group router_0 of model router specifies invalid or unsupported gpu id 0. GPUs with at least the minimum required CUDA compute compatibility of 6.000000
My questions are:
What is the number of spawned jobs associated with each machine related to? (GPU count, Inferentia cores ?)
How can one use the Inference Recommendation service for LLMs considering they routinely exceed the 10GB AWS Lambda threshold?
Why does 1 job successfully complete on the ml.g5.12xlarge when the remaining jobs (for this instance and others as well) failed with the image size error?
How does one avoid the "Benchmark failed to finish within job duration" error?
Are there specific settings that one must account for when running recommendation jobs on Inferentia machines?
This question represents an upgraded repost of the questions I addressed in the first link you attached to this answer (as a consequence of the superficial treatment of the topic)
Unfortunately, none of the resources attached in your answer address the questions I am raising.
Hello Giovanni,
This question represents an upgraded repost of the questions I addressed in the first link you attached to this answer (as a consequence of the superficial treatment of the topic) Unfortunately, none of the resources attached in your answer address the questions I am raising.