By using AWS re:Post, you agree to the Terms of Use

Why my sagemaker training job slower than notebook from studiolab.sagemaker.aws?

0

I run neural network tensorflow train on studiolab. and I got:

Epoch 145/4000
1941/1941 - 10s - ... - 10s/epoch - 5ms/step

then I try to make a train job with script_mode with ml.c5.xlarge

estimator = TensorFlow(entry_point='untitled.py',
                       source_dir='./training/',
                       instance_type='ml.c5.xlarge',
                       instance_count=1,
                       output_path="s3://sagemaker-[skip]",
                       role=sagemaker.get_execution_role(),
                       framework_version='2.8.0',
                       py_version='py39',
                       hyperparameters={...},
                       metric_definitions=[...],
                       script_mode=True)

and its got:

Epoch 19/4000
1941/1941 - 49s - ... - 49s/epoch - 25ms/step

Why is it 5 times slower than studiolab notebook? Is it because instance type?

1 Answer
0

May I know which instance type you are using for training locally on your notebook instance. Including factors that influence training performance, hardware spec of the training node is very critical. You might be either getting bottlenecked on CPU, Storage or Memory. See here for more details

profile picture
EXPERT
answered 5 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions