PyTorch DDP on Sagemaker

0

Hi, I am using pytorch ddp on sagemaker. It is using mpi and running 4 separate processes on each of the 4 GPUs in g4dn.12xlarge. In pytorch, this is called ddp_sapwn, right? Is there a way to force DDP instead of ddp_spawn on Sagemaker? My distribution argument is

distribution = { 
    "pytorchddp": {
        "enabled": True,
        "custom_mpi_options": "-verbose -x NCCL_DEBUG=VERSION"
    }
}
  • SageMaker use DDP essentially it runs a separate process on each GPU and use DDP on each GPU to initialize and run.

已提問 1 年前檢視次數 78 次
沒有答案

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南