SageMaker MXNet local mode not working

0

Hi, I am trying to fit an MXNet model locally. I am adapting this https://aws.amazon.com/blogs/machine-learning/use-the-amazon-sagemaker-local-mode-to-train-on-your-notebook-instance/ and doing the following:

bucket = 'XXXXXXXXXXX'
prefix = 'sagemaker/cifar-bench/data'

inputs = sagemaker_session.upload_data(
    path='data',
    bucket=bucket, 
    key_prefix=prefix)

print('data sent to ' + inputs)


Inception = MXNet('gluon_cifar_net.py', 
          role=role, 
          train_instance_count=1, 
          train_instance_type='local_gpu',
          framework_version='1.2.1',
          base_job_name='cifar10-inception-',
          hyperparameters={'batch_size': 256, 
                           'optimizer': 'sgd',
                           'epochs': 100, 
                           'learning_rate': 0.1, 
                           'momentum': 0.9})


Inception.fit(inputs)

which returns an OSError: [Errno 2] No such file or directory

In the error log I can see that there seems to be error at self.latest_training_job = _TrainingJob.start_new(self, inputs) and self.sagemaker_client.create_training_job(**train_request)

How can I make the local mode work?

AWS
エキスパート
質問済み 6年前576ビュー
2回答
0
承認された回答

It is very likely that you don't have docker-compose (or docker) installed in the box, that is why you are getting a No such file or directory.

If you want to use the GPU setup I would recommend running on a sagemaker notebook instance. Navigate to one of the example notebooks such as: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/mxnet_gluon_cifar10/mxnet_cifar10_local_mode.ipynb

And run the setup.sh cell. This will install and configure all the docker dependencies correctly and then you should be able to use MXNet locally on GPU without any issue.

回答済み 6年前
0

Hello,

I am attempting to run my estimator locally within a SageMaker Studio Notebook but running into the following issue when running setup.sh:

!/bin/bash ./setup.sh

./setup.sh: line 3: sudo: command not found The user does not have root access. Everything required to run the notebook is already installed and setup. We are good to go!

I am using the script from here: https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/pytorch_cnn_cifar10/setup.sh

I've read online that when you start a SageMaker notebook, you are supposed to have root access, but I do not. Please help.

Edit: Nevermind, I referenced this issue https://github.com/aws/amazon-sagemaker-examples/issues/1419 and see that Studio Notebooks do not inherently support local mode.

Samuel
回答済み 1年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ