- Newest
- Most votes
- Most comments
Hi Chris, the option to use input files is to directly use the S3 URIs in the notebook itself, i.e., instead of reading from an inputs
folder in your local EFS storage (which doesn't get copied over to inputs
folder for the training job), read the inputs directly from the S3 URI. If the inputs will be dynamic for your notebook jobs, use parameterized executions (reference - https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run-troubleshoot-override.html)
Thank you for the answer! So just to be clear, I'd suck all input files down from S3 at the start of the notebook, essentially using ephemeral storage that's specific to the job? And I presume that storage is truly job-specific and cleaned up at the end of the job?
Exactly! Notebook executions runs a training job, so the compute and storage is ephemeral.
Okay great, thanks again!
Relevant content
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 9 months ago
I tried creating an explicit inputs folder in the S3 bucket, created and populated various subfolders in there, and then specified that URI as the inputs S3 URI. However, SageMaker still created a temp folder within that URI, with its own "input" subfolder, in which it put the notebook and initialization script. So it doesn't look like I can proactively stage inputs in S3, given that the input folder is always created dynamically, within a temp folder created for the job.