How to fix SageMaker training job error "SM_CHANNEL_TRAIN"?

0

I am building a ml workflow using step function following this. However, when I start the state machine, I got error

AlgorithmError: framework error ... SM_CHANNEL_TRAIN ...exit code: 1 

Does anyone know how to fix it? or how to set SM_CHANNEL_TRAIN?

Thank you

hai
已提问 2 年前405 查看次数
1 回答
0

Assuming you are using the sagemaker python sdk, you'll have to specify the train channel.

The example below shows how to specify 3 channels and their respective paths to S3. In the training container that is started, these will be translated to the environment variable SM_CHANNEL_{channel_name.upper()}. I.e. train channel is translated to SM_CHANNEL_TRAIN, test123 is translated to SM_CHANNEL_TEST123.

from sagemaker.estimator import Estimator


s3pth = 's3://mybucket'

data = {
	'train': f'{s3pth}/train',
    'validation': f'{s3pth}/validation',
    'test': f'{s3pth}/test',
}

# starting the train job with our uploaded datasets as input
estimator.fit(
    data,
    wait=False,
    # job_name = f"{experiment_name}--{pd.Timestamp.now().strftime('%y%m%d-%H%M%S')}",
    # experiment_config = {
    #     "TrialName": trial.trial_name,
    #     "TrialComponentDisplayName": "Training",
    # },
)
已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则