How to fix SageMaker training job error "SM_CHANNEL_TRAIN"?


I am building a ml workflow using step function following this. However, when I start the state machine, I got error

AlgorithmError: framework error ... SM_CHANNEL_TRAIN ...exit code: 1 

Does anyone know how to fix it? or how to set SM_CHANNEL_TRAIN?

Thank you

asked 7 months ago48 views
1 Answer

Assuming you are using the sagemaker python sdk, you'll have to specify the train channel.

The example below shows how to specify 3 channels and their respective paths to S3. In the training container that is started, these will be translated to the environment variable SM_CHANNEL_{channel_name.upper()}. I.e. train channel is translated to SM_CHANNEL_TRAIN, test123 is translated to SM_CHANNEL_TEST123.

from sagemaker.estimator import Estimator

s3pth = 's3://mybucket'

data = {
	'train': f'{s3pth}/train',
    'validation': f'{s3pth}/validation',
    'test': f'{s3pth}/test',

# starting the train job with our uploaded datasets as input
    # job_name = f"{experiment_name}--{'%y%m%d-%H%M%S')}",
    # experiment_config = {
    #     "TrialName": trial.trial_name,
    #     "TrialComponentDisplayName": "Training",
    # },
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions