How to fix SageMaker training job error "SM_CHANNEL_TRAIN"?

0

I am building a ml workflow using step function following this. However, when I start the state machine, I got error

AlgorithmError: framework error ... SM_CHANNEL_TRAIN ...exit code: 1 

Does anyone know how to fix it? or how to set SM_CHANNEL_TRAIN?

Thank you

hai
feita há 2 anos410 visualizações
1 Resposta
0

Assuming you are using the sagemaker python sdk, you'll have to specify the train channel.

The example below shows how to specify 3 channels and their respective paths to S3. In the training container that is started, these will be translated to the environment variable SM_CHANNEL_{channel_name.upper()}. I.e. train channel is translated to SM_CHANNEL_TRAIN, test123 is translated to SM_CHANNEL_TEST123.

from sagemaker.estimator import Estimator


s3pth = 's3://mybucket'

data = {
	'train': f'{s3pth}/train',
    'validation': f'{s3pth}/validation',
    'test': f'{s3pth}/test',
}

# starting the train job with our uploaded datasets as input
estimator.fit(
    data,
    wait=False,
    # job_name = f"{experiment_name}--{pd.Timestamp.now().strftime('%y%m%d-%H%M%S')}",
    # experiment_config = {
    #     "TrialName": trial.trial_name,
    #     "TrialComponentDisplayName": "Training",
    # },
)
respondido há 2 anos

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas