SageMaker MXNet local mode not working

0

Hi, I am trying to fit an MXNet model locally. I am adapting this https://aws.amazon.com/blogs/machine-learning/use-the-amazon-sagemaker-local-mode-to-train-on-your-notebook-instance/ and doing the following:

bucket = 'XXXXXXXXXXX'
prefix = 'sagemaker/cifar-bench/data'

inputs = sagemaker_session.upload_data(
    path='data',
    bucket=bucket, 
    key_prefix=prefix)

print('data sent to ' + inputs)


Inception = MXNet('gluon_cifar_net.py', 
          role=role, 
          train_instance_count=1, 
          train_instance_type='local_gpu',
          framework_version='1.2.1',
          base_job_name='cifar10-inception-',
          hyperparameters={'batch_size': 256, 
                           'optimizer': 'sgd',
                           'epochs': 100, 
                           'learning_rate': 0.1, 
                           'momentum': 0.9})


Inception.fit(inputs)

which returns an OSError: [Errno 2] No such file or directory

In the error log I can see that there seems to be error at self.latest_training_job = _TrainingJob.start_new(self, inputs) and self.sagemaker_client.create_training_job(**train_request)

How can I make the local mode work?

AWS
EXPERTO
preguntada hace 6 años575 visualizaciones
2 Respuestas
0
Respuesta aceptada

It is very likely that you don't have docker-compose (or docker) installed in the box, that is why you are getting a No such file or directory.

If you want to use the GPU setup I would recommend running on a sagemaker notebook instance. Navigate to one of the example notebooks such as: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/mxnet_gluon_cifar10/mxnet_cifar10_local_mode.ipynb

And run the setup.sh cell. This will install and configure all the docker dependencies correctly and then you should be able to use MXNet locally on GPU without any issue.

respondido hace 6 años
0

Hello,

I am attempting to run my estimator locally within a SageMaker Studio Notebook but running into the following issue when running setup.sh:

!/bin/bash ./setup.sh

./setup.sh: line 3: sudo: command not found The user does not have root access. Everything required to run the notebook is already installed and setup. We are good to go!

I am using the script from here: https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/pytorch_cnn_cifar10/setup.sh

I've read online that when you start a SageMaker notebook, you are supposed to have root access, but I do not. Please help.

Edit: Nevermind, I referenced this issue https://github.com/aws/amazon-sagemaker-examples/issues/1419 and see that Studio Notebooks do not inherently support local mode.

Samuel
respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas