Optimal notebook instance type for DeepAR in AWS Sagemaker

0

I am currently utilizing an ml.c4.2xlarge instance type for a DeepAR use case to run an Automated Model Tuning job. The data consists of 7157 time series with 152 timesteps in the training set and 52 timesteps in the test set respectively. I estimate the run time for the tuning job on this specific instance type to take about 4-5 days. Looking to find out if DeepAR is engineered to take advantage of GPU computing for training and if it would be advisable to use a 'p' or 'g' compute instance instead for faster results. Also would be great for recommendations as to which Accelerated Computing instance would be optimal for this scenario.

asked 2 years ago531 views
1 Answer
1
Accepted Answer

(As detailed further on the algorithm details page), yes, the SageMaker DeepAR algorithm implementation is able to train on GPU-accelerated instances to speed up more challenging jobs. There's also a handy reference table here listing all the SageMaker built-in algorithms and whether they're likely to be accelerated with GPU.

However, to be clear, it shouldn't be the notebook instance type that affects this... Typically when training models on SageMaker, the notebook would provide your interactive compute environment but you'd run training in training jobs - for example using the SageMaker Python SDK Estimator class as shown in the sample notebooks for DeepAR electricity and synthetic. The instance type you select for training is independent of the instance type you use for your notebook - for example in the electricity notebook it's set as follows:

estimator = sagemaker.estimator.Estimator(
    image_uri=image_name,
    sagemaker_session=sagemaker_session,
    role=role,
    train_instance_count=1,  # <-- Setting training instance count
    train_instance_type="ml.c4.2xlarge",  # <-- Setting training instance type
    base_job_name="deepar-electricity-demo",
    output_path=s3_output_path,
)

So normally I wouldn't expect you to need to change your notebook instance type to speed up training - just edit the configuration of your training job from within the notebook.

Suggesting a particular type is tricky because DeepAR hyperparameters like context_length, embedding_dimension, and mini_batch_size will affect how much GPU capacity is needed for a particular run. Since you're coming from CPU-only baseline, I'd maybe suggest to start small with trying out single-GPU g4dn.xlarge, g5.xlarge or p3.2xlarge instances, perhaps starting with the lowest cost-per-hour? You can keep an eye on your jobs' GPUUtilization and GPUMemoryUtilization metrics to check whether utilization is low on instances like p3 with "bigger" GPUs. Increasing mini_batch_size should help fill extra capacity on these and complete your job faster, but it will probably affect model convergence - so may need to tune other parameters like learning_rate to try and compensate. So considering all of this, you may find trade-offs between speed and total cost, or speed and accuracy, for good hyperparameter combinations on your dataset. Of course you could also scale up to multi-GPU instance types if you'd like to accelerate further.

If I understood right you're also using SageMaker Automatic Hyperparameter Tuning to search these parameters, something like this XGBoost notebook with the HyperparameterTuner class?

In that case would also mention:

  • Increasing the max_parallel_jobs parameter may accelerate the overall run time (by running more of the individual training jobs in parallel) - with a trade-off on how much information is available when each training job in the budget is kicked off.
  • If you're planning to run this training regularly on a dataset which evolves over time, you probably don't need to run HPO each time: Will likely see good results using your previously-optimized hyperparameters, unless something materially changes in the nature of the data and patterns.
AWS
EXPERT
Alex_T
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions