Can SageMaker notebook jobs access studio storage?

0

I'm using SageMaker Studio, and I have my data files as well as a requirements.txt organized under my home directory. All works fine when I run notebook kernels interactively: they can access my files just fine. However, when I create a "notebook job", it doesn't seem to have access to any of my files. Is there a way to give my notebook job access to the same file system as my interactive notebooks?

After I run a job, I see that a folder for the job was created within the input S3 bucket, and within that folder there's a "input/" subfolder. But I don't know how to predict the name of the temp folder created for the job, so it doesn't seem like I could myself drop additional inputs in there, even if I wanted to. And if I could, how would I find them, at run-time?

Could sure use guidance on how my notebook jobs can access input files.

Thanks,

Chris

  • I tried creating an explicit inputs folder in the S3 bucket, created and populated various subfolders in there, and then specified that URI as the inputs S3 URI. However, SageMaker still created a temp folder within that URI, with its own "input" subfolder, in which it put the notebook and initialization script. So it doesn't look like I can proactively stage inputs in S3, given that the input folder is always created dynamically, within a temp folder created for the job.

posta un anno fa806 visualizzazioni
1 Risposta
0
Risposta accettata

Hi Chris, the option to use input files is to directly use the S3 URIs in the notebook itself, i.e., instead of reading from an inputs folder in your local EFS storage (which doesn't get copied over to inputs folder for the training job), read the inputs directly from the S3 URI. If the inputs will be dynamic for your notebook jobs, use parameterized executions (reference - https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run-troubleshoot-override.html)

AWS
Durga_S
con risposta un anno fa
profile picture
ESPERTO
verificato 14 giorni fa
  • Thank you for the answer! So just to be clear, I'd suck all input files down from S3 at the start of the notebook, essentially using ephemeral storage that's specific to the job? And I presume that storage is truly job-specific and cleaned up at the end of the job?

  • Exactly! Notebook executions runs a training job, so the compute and storage is ephemeral.

  • Okay great, thanks again!

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande