Pipeline training step's custom output

0

A pipeline train step saves a custom json file in the output path, set in the estimator's output_path param, as seen below:

estimator = TensorFlow(
        entry_point=code_entry,
        source_dir=code_dir,
        output_path='s3://some-bucket/results/',
        [... other params...]
    )

It seems there is no step property to access custom output files, unlike we have for model artificats: step.properties.ModelArtifacts.S3ModelArtifacts.

Also, unlike ProcessingStep, there is no outputs argument on TrainingStep that would allow other steps to access the S3 URI from the arguments e.g., something similar to this:

step.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"].

How can one access the custom outputs of a training step, stored in the output.tar.gz?

  • Running into the same issue and found your question. Did you find solution for the above by any chance?

2 Answers
0

There should be a way to achieve this. The properties attribute just mimics the response of Describe* API call. In DescribeTrainingJob API, we have OutputDataConfig field as in https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html#API_DescribeTrainingJob_ResponseSyntax

So should be something like step.properties.OutputDataConfig.S3OutputPath

AWS
answered a year ago
  • Having the same problem and tried setting an input to my evaluation job with source "step.properties.OutputDataConfig.S3OutputPath", but it doesn't evaluate to correct s3 path. Any other ideas?

-1

Hi,

You will need to provide the output path in the EstimatorBase of the training step.

Examples

   estimator_example = Estimator(
        base_job_name="example",
        role=role_arn,
        instance_count=1,
        output_path=f"s3://example-bucket/my-model",
        environment={"region": region.name, "scope": scope_parameter},
        output_kms_key=kms_key_arn
    )

    training_step = TrainingStep(
        name="ExampleModelTraining",
        estimator=estimator_example,
        inputs={
            "training_data": TrainingInput(
                s3_data=transform_step.properties.ProcessingOutputConfig.Outputs[
                    transform_training_data_output_name
                ].S3Output.S3Uri
            ),
            "training_target": TrainingInput(
                s3_data=transform_step.properties.ProcessingOutputConfig.Outputs[
                    transform_training_target_output_name
                ].S3Output.S3Uri
            ),
        },
    )

Thanks,

AWS
Jady
answered a year ago
  • Hi! As you can see in my question, Im already providing the output_path in my estimator. The question is: how can other pipeline steps access the custom outputs generated by the training step?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions