how to create a training step in sagemaker pipeline?

0

I have a following project structure. i clone my project in sagamaker studio and create a sagemaker pipeline (sample below) . in the processing step , i can pass processing input , where i can specify , my utils folder, where i have additional helper code (source="src/utils", ), which i suppose get copied to the sagemaker instance, i can use helper.py module in my processing.py. this set up works. I want to use similar construct, for my training step too. but in the documentation , i dont' see where i can pass similar inputs for the training step. how can achieve this?how can i specify in my training step that i want to copy additional code/folder to training instance and use those helper methods?

pipeline_project
      src
          processing.py
          train.py
      utils
           helper.py
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput

script_processor = ScriptProcessor(command=['python3'],
                image_uri='image_uri',
                role='role_arn',
                instance_count=1,
                instance_type='ml.m5.xlarge', 
)

step_process = ProcessingStep(
        name="ProcessStep",
        processor=script_processor, 
        code = 'src/processing.py'
        input = [ 
                ProcessingInput(
                      input_name="utils"
                      source="src/utils", 
                      destination="/opt/ml/processing/input/src/utils",
                )
)
asked a year ago607 views
1 Answer
0

When you create a training step, you need to pass in an Estimator object as an argument to the training step. Notice the xgb_estimator object in the code below. You can pass in a source_dir argument to the estimator and add additional code dependencies at that location.

Create the Estimator

from sagemaker.xgboost.estimator import XGBoost

xgb_estimator = XGBoost(
    entry_point="abalone.py",
    source_dir="code",
    hyperparameters=hyperparameters,
    role=role,
    instance_count=1,
    instance_type="ml.m5.2xlarge",
    framework_version="1.0-1",
)

Provide the Estimator as an argument to the Training step

from sagemaker.workflow.pipeline_context import PipelineSession

from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep

from sagemaker.xgboost.estimator import XGBoost

pipeline_session = PipelineSession()

xgb_estimator = XGBoost(..., sagemaker_session=pipeline_session)

step_args = xgb_estimator.fit(
    inputs={
        "train": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "train"
            ].S3Output.S3Uri,
            content_type="text/csv"
        ),
        "validation": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "validation"
            ].S3Output.S3Uri,
            content_type="text/csv"
        )
    }
)

step_train = TrainingStep(
    name="TrainAbaloneModel",
    step_args=step_args,
)
Ashish
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions