using data parallelization with SageMaker JumpStart

0

I am trying to train Faster-RCNN model available on SageMaker JumpStart and wonder if it is possible to use Data Parallelization feature to finish the job faster as the size of training data is big? I set the environment variable "LAUNCH_SM_DDP_ENV_NAME" to True inside estimator.JumpStartEstimator class and increased the number of instances to 10 (as example). What happens is that it just launches 10 training jobs running in parallel but it does not finish faster (in fact in finishes the same time as with 1 instance). Any hint is appreciated!!

alex
質問済み 5ヶ月前2081ビュー
1回答
0

While I am not sure on the exact Model you are using, I suggest taking a look at training script that JumpStart is using and see if there is any implementation of DDP

AWS
Marc
回答済み 5ヶ月前
  • As per the documentation the only fine-tunable PyTorch Object Detection model on SageMaker JumpStart is "pytorch-od1-fasterrcnn-resnet50-fpn". and I checked its training script and it does not seem to have DDP implemented. So I assume one cannot benefit from the DDP strategy with this model on JS. I assume I will have to implement it by myself by updating the trasnfer_learning.py (docker image entry point).

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ