Salta al contenuto

How to use dynamic arguments in SageMaker job using Step Functions?

0

I am using AWS Step Functions to orchestrate some SageMaker jobs. However, I can't seem to dynamically retrieve the Var variable for each job. Meaning that each launched container should be run with a specific argument that is specified in the Jobs section in the following yaml file. Is there a workaround ? I have tried using Environment variable but it also doesn't support dynamic arguments.

Comment: "Run  SageMaker jobs in parallel with a max concurrency of 4"
StartAt: PrepareJobList
States:
  PrepareJobList:
    Type: Pass
    Result:
      Jobs: 
        - { Var: "X" }
        - { Var: "XX" }
    ResultPath: "$.JobList"
    Next: RunJobs

  RunJobs:
    Type: Map
    Iterator:
      StartAt: GenerateJobName
      States:
        GenerateJobName:
          Type: Pass
          Parameters:
            "TrainingJobName.$": "States.Format('Job-{}-{}', $.Var, States.UUID())"
            "Var.$": "$.Var"
          ResultPath: "$.JobDetails"
          Next: RunSageMakerJob

        RunSageMakerJob:
          Type: Task
          Resource: arn:aws:states:::sagemaker:createTrainingJob.sync
          Parameters:
            TrainingJobName.$: "$.JobDetails.TrainingJobName"
            AlgorithmSpecification:
              TrainingImage: "X"
              TrainingInputMode: "File"
              ContainerEntrypoint:
                - python
                - src/script.py
              ContainerArguments:
                - "--var"
                - "$.JobDetails.Var"
            StoppingCondition:
              MaxRuntimeInSeconds: 36000  
            RoleArn: "X"  
            ResourceConfig:
              InstanceType: "ml.m5.large"  
              InstanceCount: 1  
              VolumeSizeInGB: 50  
            OutputDataConfig:
              S3OutputPath: "x"
          Retry:
            - ErrorEquals: ["ResourceLimitExceeded"]
              IntervalSeconds: 300
              MaxAttempts: 10
              BackoffRate: 2.0
          End: true
    ItemsPath: "$.JobList.Jobs"
    MaxConcurrency: 4
    End: true

1 Risposta
1
Risposta accettata

It's not exactly clear from the question, but I'm guessing the error you see is that the unresolved "$.JobDetails.Var" string is what gets passed through to your job?

Unlike in TrainingJobName.$, you haven't ended the parameter name with .$ for ContainerArguments.$, to indicate that it's a pathspec.

However since this field is an array, I believe you'll need to use this trick for something like:

ContainerArguments.$: "States.Array('--var', $.JobDetails.Var)"

Unfortunately I don't have a similar complete example to hand to easily test with.

For what it's worth, you may prefer to use the HyperParameters dictionary for this? Hyperparams are more broadly visible in SageMaker console & UIs than container arguments - and would be eligible for things like automatic HP tuning later if you wanted to use that. Provided hyperparameters should get automatically written into a /opt/ml/input/config/hyperparameters.json file in your container.

If you wanted, you could use the sagemaker-training-toolkit in your custom container to load environment variables & run your script with hyperparam CLI arguments like SageMaker Script Mode does. If you're already using one of the AWS-provided framework images, they'll already have something like this (might be slightly different e.g. the sagemaker-pytorch-training-toolkit or similar).

AWS
ESPERTO
con risposta un anno fa
ESPERTO
verificato 9 mesi fa
  • The States.Array trick solved it! Thank you for the detailed explanation.

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.