How to fix SageMaker training job error "SM_CHANNEL_TRAIN"?

0

I am building a ml workflow using step function following this. However, when I start the state machine, I got error

AlgorithmError: framework error ... SM_CHANNEL_TRAIN ...exit code: 1 

Does anyone know how to fix it? or how to set SM_CHANNEL_TRAIN?

Thank you

hai
demandé il y a 2 ans404 vues
1 réponse
0

Assuming you are using the sagemaker python sdk, you'll have to specify the train channel.

The example below shows how to specify 3 channels and their respective paths to S3. In the training container that is started, these will be translated to the environment variable SM_CHANNEL_{channel_name.upper()}. I.e. train channel is translated to SM_CHANNEL_TRAIN, test123 is translated to SM_CHANNEL_TEST123.

from sagemaker.estimator import Estimator


s3pth = 's3://mybucket'

data = {
	'train': f'{s3pth}/train',
    'validation': f'{s3pth}/validation',
    'test': f'{s3pth}/test',
}

# starting the train job with our uploaded datasets as input
estimator.fit(
    data,
    wait=False,
    # job_name = f"{experiment_name}--{pd.Timestamp.now().strftime('%y%m%d-%H%M%S')}",
    # experiment_config = {
    #     "TrialName": trial.trial_name,
    #     "TrialComponentDisplayName": "Training",
    # },
)
répondu il y a 2 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions