AWS re:Post을(를) 사용하면 다음에 동의하게 됩니다. AWS re:Post 이용 약관

Which GPU instances are supported by the sagemaker algorithm forecasting-deepar?

0

I previously ran a hyperparameter tuning job for SageMaker DeepAR with the instance type ml.c5.18xlarge but it seems insufficient to complete the tuning job within the max_run time specified in my account. Now, having tried to use the accelerated GPU instance ml.g4dn.16xlarge, I am prompted with an error - "Instance type ml.g4dn.16xlarge is not supported by algorithm forecasting-deepar."

I cannot find any documentation that indicates the list of instance types supported by deepar. What GPU/CPU instances have more compute capacity than ml.c5.18xlarge which I could leverage for my tuning job?

If there isn't, I would appreciate any recommendations as to how I could hasten the run time of the job. I require the tuning job to complete within the max run time of 432000 seconds. Thank you in advance!

1개 답변
1

Hi, thanks for pointing this out. Indeed, all g4dn instances are currently not supported by the forecasting-deepar algorithm, but as you rightly point out, this is currently not documented. I will raise this with the service team to include in in the documentation.

In the meantime, you can try out the P3 instances instead - these are also powerful GPU instances and should help you speed up the training time.

AWS
답변함 3년 전
  • I appreciate the quick response @Heiko! I see that for training there are 3 P3 instance options available, i.e. - 2xlarge, 8xlarge and 16xlarge. It would be super helpful if you could confirm which of these are configured for deepar.

    Additionally, I was hoping you could help me understand how the parameter 'instance_count' in the sagemaker Estimator class affects training time. The way I understand it is that the number attributed to this parameter results in the number of EC2 instances with the specified instance type to be allocated. For example with an instance_count = '3', we would have 3 EC2 instances, each with a p3.2xlarge (for example) launched to parallelize training.

    If so, which would you say is better in terms of improving training speed - using a higher instance_count / a single higher compute capacity instance? Thank you!

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠