How to fix SageMaker training job error "SM_CHANNEL_TRAIN"?

0

I am building a ml workflow using step function following this. However, when I start the state machine, I got error

AlgorithmError: framework error ... SM_CHANNEL_TRAIN ...exit code: 1 

Does anyone know how to fix it? or how to set SM_CHANNEL_TRAIN?

Thank you

hai
질문됨 2년 전367회 조회
1개 답변
0

Assuming you are using the sagemaker python sdk, you'll have to specify the train channel.

The example below shows how to specify 3 channels and their respective paths to S3. In the training container that is started, these will be translated to the environment variable SM_CHANNEL_{channel_name.upper()}. I.e. train channel is translated to SM_CHANNEL_TRAIN, test123 is translated to SM_CHANNEL_TEST123.

from sagemaker.estimator import Estimator


s3pth = 's3://mybucket'

data = {
	'train': f'{s3pth}/train',
    'validation': f'{s3pth}/validation',
    'test': f'{s3pth}/test',
}

# starting the train job with our uploaded datasets as input
estimator.fit(
    data,
    wait=False,
    # job_name = f"{experiment_name}--{pd.Timestamp.now().strftime('%y%m%d-%H%M%S')}",
    # experiment_config = {
    #     "TrialName": trial.trial_name,
    #     "TrialComponentDisplayName": "Training",
    # },
)
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠