SageMaker MXNet local mode not working

0

Hi, I am trying to fit an MXNet model locally. I am adapting this https://aws.amazon.com/blogs/machine-learning/use-the-amazon-sagemaker-local-mode-to-train-on-your-notebook-instance/ and doing the following:

bucket = 'XXXXXXXXXXX'
prefix = 'sagemaker/cifar-bench/data'

inputs = sagemaker_session.upload_data(
    path='data',
    bucket=bucket, 
    key_prefix=prefix)

print('data sent to ' + inputs)


Inception = MXNet('gluon_cifar_net.py', 
          role=role, 
          train_instance_count=1, 
          train_instance_type='local_gpu',
          framework_version='1.2.1',
          base_job_name='cifar10-inception-',
          hyperparameters={'batch_size': 256, 
                           'optimizer': 'sgd',
                           'epochs': 100, 
                           'learning_rate': 0.1, 
                           'momentum': 0.9})


Inception.fit(inputs)

which returns an OSError: [Errno 2] No such file or directory

In the error log I can see that there seems to be error at self.latest_training_job = _TrainingJob.start_new(self, inputs) and self.sagemaker_client.create_training_job(**train_request)

How can I make the local mode work?

AWS
專家
已提問 6 年前檢視次數 575 次
2 個答案
0
已接受的答案

It is very likely that you don't have docker-compose (or docker) installed in the box, that is why you are getting a No such file or directory.

If you want to use the GPU setup I would recommend running on a sagemaker notebook instance. Navigate to one of the example notebooks such as: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/mxnet_gluon_cifar10/mxnet_cifar10_local_mode.ipynb

And run the setup.sh cell. This will install and configure all the docker dependencies correctly and then you should be able to use MXNet locally on GPU without any issue.

已回答 6 年前
0

Hello,

I am attempting to run my estimator locally within a SageMaker Studio Notebook but running into the following issue when running setup.sh:

!/bin/bash ./setup.sh

./setup.sh: line 3: sudo: command not found The user does not have root access. Everything required to run the notebook is already installed and setup. We are good to go!

I am using the script from here: https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/pytorch_cnn_cifar10/setup.sh

I've read online that when you start a SageMaker notebook, you are supposed to have root access, but I do not. Please help.

Edit: Nevermind, I referenced this issue https://github.com/aws/amazon-sagemaker-examples/issues/1419 and see that Studio Notebooks do not inherently support local mode.

Samuel
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南