SageMaker MXNet local mode not working

0

Hi, I am trying to fit an MXNet model locally. I am adapting this https://aws.amazon.com/blogs/machine-learning/use-the-amazon-sagemaker-local-mode-to-train-on-your-notebook-instance/ and doing the following:

bucket = 'XXXXXXXXXXX'
prefix = 'sagemaker/cifar-bench/data'

inputs = sagemaker_session.upload_data(
    path='data',
    bucket=bucket, 
    key_prefix=prefix)

print('data sent to ' + inputs)


Inception = MXNet('gluon_cifar_net.py', 
          role=role, 
          train_instance_count=1, 
          train_instance_type='local_gpu',
          framework_version='1.2.1',
          base_job_name='cifar10-inception-',
          hyperparameters={'batch_size': 256, 
                           'optimizer': 'sgd',
                           'epochs': 100, 
                           'learning_rate': 0.1, 
                           'momentum': 0.9})


Inception.fit(inputs)

which returns an OSError: [Errno 2] No such file or directory

In the error log I can see that there seems to be error at self.latest_training_job = _TrainingJob.start_new(self, inputs) and self.sagemaker_client.create_training_job(**train_request)

How can I make the local mode work?

AWS
EXPERT
asked 6 years ago562 views
2 Answers
0
Accepted Answer

It is very likely that you don't have docker-compose (or docker) installed in the box, that is why you are getting a No such file or directory.

If you want to use the GPU setup I would recommend running on a sagemaker notebook instance. Navigate to one of the example notebooks such as: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/mxnet_gluon_cifar10/mxnet_cifar10_local_mode.ipynb

And run the setup.sh cell. This will install and configure all the docker dependencies correctly and then you should be able to use MXNet locally on GPU without any issue.

answered 6 years ago
0

Hello,

I am attempting to run my estimator locally within a SageMaker Studio Notebook but running into the following issue when running setup.sh:

!/bin/bash ./setup.sh

./setup.sh: line 3: sudo: command not found The user does not have root access. Everything required to run the notebook is already installed and setup. We are good to go!

I am using the script from here: https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/pytorch_cnn_cifar10/setup.sh

I've read online that when you start a SageMaker notebook, you are supposed to have root access, but I do not. Please help.

Edit: Nevermind, I referenced this issue https://github.com/aws/amazon-sagemaker-examples/issues/1419 and see that Studio Notebooks do not inherently support local mode.

Samuel
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions