Sagemaker batch transform 415 error

0

Hi, I need to run XGBoost inferences on 15MM samples (3.9Gb when stored as csv). Since Batch transform does not seem to work on such large batches (max payload 100MB) I split my input file into 646 files, each around 6Mb, stored in S3. I am running the code below:

transformer = XGB.transformer(
    instance_count=2, instance_type='ml.c5.9xlarge',
    output_path='s3://xxxxxxxxxxxxx/sagemaker/recsys/xgbtransform/',
    max_payload=100)

transformer.transform(
    data='s3://xxxxxxxxxxxxx/sagemaker/recsys/testchunks/',
    split_type='Line')

But the job fails - Sagemaker tells "ClientError: Too many objects failed. See logs for more information" and cloudwatch logs show:

Bad HTTP status returned from invoke: 415
'NoneType' object has no attribute 'lower'

Did I forget something in my batch transform settings?

AWS
エキスパート
質問済み 6年前918ビュー
1回答
0
承認された回答

This indicates that the algorithm thinks it has been passed bad data. Perhaps a problem with your splitting?

I would suggest two things:

  1. Try running the algorithm on the original data using the "SplitType": "Line" and "BatchStrategy": "MultiRecord" arguments and see if you have better luck.
  2. Look in the cloudwatch logs for your run and see if there's any helpful information about what the algorithm didn't like. You can find these in the log group "/aws/sagemaker/TransformJobs" in the log stream that begins with your job name.
回答済み 6年前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ