Hi, I am using pytorch ddp on sagemaker. It is using mpi and running 4 separate processes on each of the 4 GPUs in g4dn.12xlarge. In pytorch, this is called ddp_sapwn, right? Is there a way to force DDP instead of ddp_spawn on Sagemaker?
My distribution
argument is
distribution = {
"pytorchddp": {
"enabled": True,
"custom_mpi_options": "-verbose -x NCCL_DEBUG=VERSION"
}
}
SageMaker use DDP essentially it runs a separate process on each GPU and use DDP on each GPU to initialize and run.