How to use dynamic arguments in SageMaker job using Step Functions?

0

I am using AWS Step Functions to orchestrate some SageMaker jobs. However, I can't seem to dynamically retrieve the Var variable for each job. Meaning that each launched container should be run with a specific argument that is specified in the Jobs section in the following yaml file. Is there a workaround ? I have tried using Environment variable but it also doesn't support dynamic arguments.

Comment: "Run  SageMaker jobs in parallel with a max concurrency of 4"
StartAt: PrepareJobList
States:
  PrepareJobList:
    Type: Pass
    Result:
      Jobs: 
        - { Var: "X" }
        - { Var: "XX" }
    ResultPath: "$.JobList"
    Next: RunJobs

  RunJobs:
    Type: Map
    Iterator:
      StartAt: GenerateJobName
      States:
        GenerateJobName:
          Type: Pass
          Parameters:
            "TrainingJobName.$": "States.Format('Job-{}-{}', $.Var, States.UUID())"
            "Var.$": "$.Var"
          ResultPath: "$.JobDetails"
          Next: RunSageMakerJob

        RunSageMakerJob:
          Type: Task
          Resource: arn:aws:states:::sagemaker:createTrainingJob.sync
          Parameters:
            TrainingJobName.$: "$.JobDetails.TrainingJobName"
            AlgorithmSpecification:
              TrainingImage: "X"
              TrainingInputMode: "File"
              ContainerEntrypoint:
                - python
                - src/script.py
              ContainerArguments:
                - "--var"
                - "$.JobDetails.Var"
            StoppingCondition:
              MaxRuntimeInSeconds: 36000  
            RoleArn: "X"  
            ResourceConfig:
              InstanceType: "ml.m5.large"  
              InstanceCount: 1  
              VolumeSizeInGB: 50  
            OutputDataConfig:
              S3OutputPath: "x"
          Retry:
            - ErrorEquals: ["ResourceLimitExceeded"]
              IntervalSeconds: 300
              MaxAttempts: 10
              BackoffRate: 2.0
          End: true
    ItemsPath: "$.JobList.Jobs"
    MaxConcurrency: 4
    End: true

Fahmi
asked 2 months ago184 views
1 Answer
1
Accepted Answer

It's not exactly clear from the question, but I'm guessing the error you see is that the unresolved "$.JobDetails.Var" string is what gets passed through to your job?

Unlike in TrainingJobName.$, you haven't ended the parameter name with .$ for ContainerArguments.$, to indicate that it's a pathspec.

However since this field is an array, I believe you'll need to use this trick for something like:

ContainerArguments.$: "States.Array('--var', $.JobDetails.Var)"

Unfortunately I don't have a similar complete example to hand to easily test with.

For what it's worth, you may prefer to use the HyperParameters dictionary for this? Hyperparams are more broadly visible in SageMaker console & UIs than container arguments - and would be eligible for things like automatic HP tuning later if you wanted to use that. Provided hyperparameters should get automatically written into a /opt/ml/input/config/hyperparameters.json file in your container.

If you wanted, you could use the sagemaker-training-toolkit in your custom container to load environment variables & run your script with hyperparam CLI arguments like SageMaker Script Mode does. If you're already using one of the AWS-provided framework images, they'll already have something like this (might be slightly different e.g. the sagemaker-pytorch-training-toolkit or similar).

AWS
EXPERT
Alex_T
answered 2 months ago
  • The States.Array trick solved it! Thank you for the detailed explanation.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions