How/where is data copied in sagemaker pipelines?

Question

example contrived for this question, I understand , when we create a sagemaker pipeline, with steps to process data and then to run training, the data is copied a local instance in some opt/ml/.. directory i would assume. this is s3 file mode for training. so if i pull down the data manually from notebok or terminal and copy it to wherever sagemaker wants. when i run the pipeline, how can i tell sagemaker that the data is already present in the local instance such that it doesn't have to download from the s3 uri ?

Answer

From the question above what I understood is you have already copied the data to local instance and would like avoid coping it with each pipeline run.

With SageMaker SDK **local mode**, you can also specify a local path instead s3 url,  the local files/dataset will be  used instead of downloading the files from S3.

This documentation shows how to specify local mode and input - [https://sagemaker.readthedocs.io/en/stable/overview.html#local-mode]()

How/where is data copied in sagemaker pipelines?

Relevant content