Mixtral_8x7b running slow in my server

0

I tried to run Mixtral_8x7b on AWS SageMaker 192 GB GPU and the instance name is ml.g5.48.xlagre. but it's still running slow.

132 seconds for 200 tokens.

Enter image description here

you can see the code in the screenshot.

All the things are in place model is loaded in GPU also all the GPUs are showing that vRAM is used. But, when i start inferencing using the model only one GPU is used to process. GPU processors are not used except for the first GPU.

Enter image description here

I have tried the following:

  • torch_dtype= torch.bfloat16
  • low_cpu_mem_usage=True
Maunish
질문됨 3달 전373회 조회
1개 답변
0

Hi,

Do you have Instance_type = "local_gpu" ? When you have "local", the model may default to CPU instead of GPU.

Best,

Didier

profile pictureAWS
전문가
답변함 3달 전
  • as I said earlier.

    All the things are in place model is loaded in GPU. also, all the GPUs are showing that vRAM is used. But, when I start inferencing using the model only one GPU is used to process. GPU processors are not used except for the first GPU. please refer to the 2nd image("GPU-Util") I have attached in question.

    Thanks Didier.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠