Mixtral_8x7b running slow in my server

I tried to run Mixtral_8x7b on AWS SageMaker 192 GB GPU and the instance name is ml.g5.48.xlagre. but it's still running slow.

132 seconds for 200 tokens.

Enter image description here

you can see the code in the screenshot.

All the things are in place model is loaded in GPU also all the GPUs are showing that vRAM is used. But, when i start inferencing using the model only one GPU is used to process. GPU processors are not used except for the first GPU.

Enter image description here

I have tried the following:

torch_dtype= torch.bfloat16
low_cpu_mem_usage=True

주제

기계 학습 및 AI

태그

아마존 SageMaker 기계 학습 및 AI

언어

English

Maunish

질문됨 3달 전373회 조회

1개 답변

최신
최다 투표
가장 많은 댓글

이 답변이 도움이 되었나요?커뮤니티가 여러분의 지식을 활용할 수 있도록 정답을 찬성하세요.

Hi,

Do you have Instance_type = "local_gpu" ? When you have "local", the model may default to CPU instead of GPU.

Best,

Didier

전문가

Didier_Durand

답변함 3달 전

Maunish
3달 전
as I said earlier.

All the things are in place model is loaded in GPU. also, all the GPUs are showing that vRAM is used. But, when I start inferencing using the model only one GPU is used to process. GPU processors are not used except for the first GPU. please refer to the 2nd image("GPU-Util") I have attached in question.

Thanks Didier.

Mixtral_8x7b running slow in my server

관련 콘텐츠