I have a training step (sample code below) in my sagemaker pipeline. once the training is done, anything inside the /opt/ml/model file is zipped up into model.tar file and sagemaker uploads it to an s3 location. can one get access this model.tar file before sagemaker uploads it to the s3 bucket. say from my training step, i wanted to access model.tar file , once the training is done and before sagemaker uploads it to the s3 output location . is it saved locally in the training instance , before uploading it to the s3 location ? if this is not possible, can i define another processing step to run after the trainign step , to download this model, wherever sagemaker saved it ( s3 uri) . to do this, can i define a processing step, such that if i give it a s3 location of the model, sagemaker will automatically download the model , or do i need to write code to download the model?
from sagemaker.estimator import Estimator
from sagemaker.inputs import TrainingInput
xgb_train = Estimator(
image_uri="some_uri",
instance_type=instance_type,
instance_count=1,
output_path=model_path,
role=role,
sagemaker_session=pipeline_session,
)
#training code
train_args = xgb_train.fit(
inputs={
"train": TrainingInput(
s3_data=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
content_type="text/csv",
),
"validation": TrainingInput(
s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
"validation"
].S3Output.S3Uri,
content_type="text/csv",
),
}
)
Can you share what you are planning to do once you access the tar file? You can write a script to download the S3 model in a processing step, but it will download it to the processing instance, which is also ephemeral.