Sagemaker notebook jobs dependency

0

I am currently working on automatically training a model, update the endpoint and logging the process in one workflow by using a jupyter notebook script, and creating a sagemaker notebook job with it. I did all my script building and testing on sagemaker notebook instance JupyterLab. So now I am facing with a question, can I turn off my notebook instance after I created a scheduled job on it? Are the jobs working independently from the notebook instance? To complete the context, this .ipynb script takes care about sagemaker sdk installation, retrieves data from s3 bucket and splits it locally to some files in the same directory it is in. It also output the training log into an s3 bucket. All the permission(sagemaker, s3, etc) is set up using a role assigned to the notebook instance while I was testing, and will also be used in the job definition. Would this job work? And would this job work while the notebook instance I used for testing it is shut down? And more generally, is this a good practice? Sorry for the immature development practice if this is one, because it is my first time using Amazon services to implement a training process.

Yun
asked a month ago466 views
1 Answer
1
Accepted Answer

Yes, you can stop your SageMaker notebook instance after creating a scheduled notebook job on it. The jobs will run independently from the notebook instance.

When you create a scheduled notebook job, it will use the IAM role and permissions configured for the notebook instance. So the job will have access to the same S3 buckets and be able to call SageMaker APIs.

The job definition specifies the notebook path and schedule. It does not depend on the original notebook instance remaining running. The jobs will execute based on the schedule using the resources defined in the IAM role.

It is generally not required to keep the original notebook instance running after creating scheduled jobs. You can stop the instance to avoid ongoing compute costs. The jobs will still run as scheduled.

For best practices, consider using SageMaker Pipelines for more advanced workflows that chain multiple jobs together based on dependencies. You can define pipelines that run Job A, then Job B, etc.

profile picture
EXPERT
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions