Can SageMaker notebook jobs access studio storage?

0

I'm using SageMaker Studio, and I have my data files as well as a requirements.txt organized under my home directory. All works fine when I run notebook kernels interactively: they can access my files just fine. However, when I create a "notebook job", it doesn't seem to have access to any of my files. Is there a way to give my notebook job access to the same file system as my interactive notebooks?

After I run a job, I see that a folder for the job was created within the input S3 bucket, and within that folder there's a "input/" subfolder. But I don't know how to predict the name of the temp folder created for the job, so it doesn't seem like I could myself drop additional inputs in there, even if I wanted to. And if I could, how would I find them, at run-time?

Could sure use guidance on how my notebook jobs can access input files.

Thanks,

Chris

  • I tried creating an explicit inputs folder in the S3 bucket, created and populated various subfolders in there, and then specified that URI as the inputs S3 URI. However, SageMaker still created a temp folder within that URI, with its own "input" subfolder, in which it put the notebook and initialization script. So it doesn't look like I can proactively stage inputs in S3, given that the input folder is always created dynamically, within a temp folder created for the job.

asked a year ago774 views
1 Answer
0
Accepted Answer

Hi Chris, the option to use input files is to directly use the S3 URIs in the notebook itself, i.e., instead of reading from an inputs folder in your local EFS storage (which doesn't get copied over to inputs folder for the training job), read the inputs directly from the S3 URI. If the inputs will be dynamic for your notebook jobs, use parameterized executions (reference - https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run-troubleshoot-override.html)

AWS
Durga_S
answered a year ago
profile picture
EXPERT
reviewed 13 hours ago
  • Thank you for the answer! So just to be clear, I'd suck all input files down from S3 at the start of the notebook, essentially using ephemeral storage that's specific to the job? And I presume that storage is truly job-specific and cleaned up at the end of the job?

  • Exactly! Notebook executions runs a training job, so the compute and storage is ephemeral.

  • Okay great, thanks again!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions