SageMaker MXNet local mode not working

0

Hi, I am trying to fit an MXNet model locally. I am adapting this https://aws.amazon.com/blogs/machine-learning/use-the-amazon-sagemaker-local-mode-to-train-on-your-notebook-instance/ and doing the following:

bucket = 'XXXXXXXXXXX'
prefix = 'sagemaker/cifar-bench/data'

inputs = sagemaker_session.upload_data(
    path='data',
    bucket=bucket, 
    key_prefix=prefix)

print('data sent to ' + inputs)


Inception = MXNet('gluon_cifar_net.py', 
          role=role, 
          train_instance_count=1, 
          train_instance_type='local_gpu',
          framework_version='1.2.1',
          base_job_name='cifar10-inception-',
          hyperparameters={'batch_size': 256, 
                           'optimizer': 'sgd',
                           'epochs': 100, 
                           'learning_rate': 0.1, 
                           'momentum': 0.9})


Inception.fit(inputs)

which returns an OSError: [Errno 2] No such file or directory

In the error log I can see that there seems to be error at self.latest_training_job = _TrainingJob.start_new(self, inputs) and self.sagemaker_client.create_training_job(**train_request)

How can I make the local mode work?

AWS
ESPERTO
posta 6 anni fa575 visualizzazioni
2 Risposte
0
Risposta accettata

It is very likely that you don't have docker-compose (or docker) installed in the box, that is why you are getting a No such file or directory.

If you want to use the GPU setup I would recommend running on a sagemaker notebook instance. Navigate to one of the example notebooks such as: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/mxnet_gluon_cifar10/mxnet_cifar10_local_mode.ipynb

And run the setup.sh cell. This will install and configure all the docker dependencies correctly and then you should be able to use MXNet locally on GPU without any issue.

con risposta 6 anni fa
0

Hello,

I am attempting to run my estimator locally within a SageMaker Studio Notebook but running into the following issue when running setup.sh:

!/bin/bash ./setup.sh

./setup.sh: line 3: sudo: command not found The user does not have root access. Everything required to run the notebook is already installed and setup. We are good to go!

I am using the script from here: https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/pytorch_cnn_cifar10/setup.sh

I've read online that when you start a SageMaker notebook, you are supposed to have root access, but I do not. Please help.

Edit: Nevermind, I referenced this issue https://github.com/aws/amazon-sagemaker-examples/issues/1419 and see that Studio Notebooks do not inherently support local mode.

Samuel
con risposta un anno fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande