Direkt zum Inhalt

How to use dynamic arguments in SageMaker job using Step Functions?

0

I am using AWS Step Functions to orchestrate some SageMaker jobs. However, I can't seem to dynamically retrieve the Var variable for each job. Meaning that each launched container should be run with a specific argument that is specified in the Jobs section in the following yaml file. Is there a workaround ? I have tried using Environment variable but it also doesn't support dynamic arguments.

Comment: "Run  SageMaker jobs in parallel with a max concurrency of 4"
StartAt: PrepareJobList
States:
  PrepareJobList:
    Type: Pass
    Result:
      Jobs: 
        - { Var: "X" }
        - { Var: "XX" }
    ResultPath: "$.JobList"
    Next: RunJobs

  RunJobs:
    Type: Map
    Iterator:
      StartAt: GenerateJobName
      States:
        GenerateJobName:
          Type: Pass
          Parameters:
            "TrainingJobName.$": "States.Format('Job-{}-{}', $.Var, States.UUID())"
            "Var.$": "$.Var"
          ResultPath: "$.JobDetails"
          Next: RunSageMakerJob

        RunSageMakerJob:
          Type: Task
          Resource: arn:aws:states:::sagemaker:createTrainingJob.sync
          Parameters:
            TrainingJobName.$: "$.JobDetails.TrainingJobName"
            AlgorithmSpecification:
              TrainingImage: "X"
              TrainingInputMode: "File"
              ContainerEntrypoint:
                - python
                - src/script.py
              ContainerArguments:
                - "--var"
                - "$.JobDetails.Var"
            StoppingCondition:
              MaxRuntimeInSeconds: 36000  
            RoleArn: "X"  
            ResourceConfig:
              InstanceType: "ml.m5.large"  
              InstanceCount: 1  
              VolumeSizeInGB: 50  
            OutputDataConfig:
              S3OutputPath: "x"
          Retry:
            - ErrorEquals: ["ResourceLimitExceeded"]
              IntervalSeconds: 300
              MaxAttempts: 10
              BackoffRate: 2.0
          End: true
    ItemsPath: "$.JobList.Jobs"
    MaxConcurrency: 4
    End: true

1 Antwort
1
Akzeptierte Antwort

It's not exactly clear from the question, but I'm guessing the error you see is that the unresolved "$.JobDetails.Var" string is what gets passed through to your job?

Unlike in TrainingJobName.$, you haven't ended the parameter name with .$ for ContainerArguments.$, to indicate that it's a pathspec.

However since this field is an array, I believe you'll need to use this trick for something like:

ContainerArguments.$: "States.Array('--var', $.JobDetails.Var)"

Unfortunately I don't have a similar complete example to hand to easily test with.

For what it's worth, you may prefer to use the HyperParameters dictionary for this? Hyperparams are more broadly visible in SageMaker console & UIs than container arguments - and would be eligible for things like automatic HP tuning later if you wanted to use that. Provided hyperparameters should get automatically written into a /opt/ml/input/config/hyperparameters.json file in your container.

If you wanted, you could use the sagemaker-training-toolkit in your custom container to load environment variables & run your script with hyperparam CLI arguments like SageMaker Script Mode does. If you're already using one of the AWS-provided framework images, they'll already have something like this (might be slightly different e.g. the sagemaker-pytorch-training-toolkit or similar).

AWS
EXPERTE
beantwortet vor einem Jahr
AWS
EXPERTE
überprüft vor 9 Monaten
  • The States.Array trick solved it! Thank you for the detailed explanation.

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.