SageMaker Autopilot training jobs stopped without any logs

0

I'm trying to use SageMaker Autopilot. I'm creating new jobs via the CreateAutoMLJob API.

The processing jobs succeed, but the training jobs just stop pretty much instantly after they start. Here's an example of the history of one of the training jobs :

Status	    Start time	            End time	            Description
Starting	4/3/2023, 7:03:42 PM	4/3/2023, 7:04:53 PM	Preparing the instances for training
Downloading	4/3/2023, 7:04:53 PM	4/3/2023, 7:04:53 PM	Downloading input data
Stopping	4/3/2023, 7:04:53 PM	4/3/2023, 7:04:53 PM	Stopping the training job
Stopped	    4/3/2023, 7:04:53 PM	4/3/2023, 7:04:53 PM	Training job stopped

There are no logs available, and there are no errors in the console. When I click the View logs button that redirects me to CloudWatch, the Log Group does not exist.

질문됨 일 년 전397회 조회
1개 답변
0

Hello,

Thank you for using SageMaker AutoPilot.

With the provided details, it is difficult to understand what could be the reason for the training job to stopped without any error message. Additionally, I do see multiple secondary status missing in the screenshot that is attached to your question. To better understand the issue, logs are required. Can you please confirm that the IAM role used is having correct permission to push the logs to CloudWatch, once that is confirm then please retry the AutoML job and see if the job is populating the logs to CloudWatch metrics. If still the logs are not populated and the training is stopping without any log information, then please reach out to Support Engineer team with all the details so that they can investigate the issue with the help of the internal tools and help you mitigate the issue.

To open a support case with AWS using the link:

https://console.aws.amazon.com/support/home?#/case/create

AWS
지원 엔지니어
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인