How can we use a JSON input for AWS Sagemaker Batch Transform Job

0

It's quite unclear how we can use a JSON input file for AWS Sagemaker Batch Transform Job. Most of the documentation I have reviewed provides examples for CSV.

I did come across some scattered forum threads online and have managed to come up with:

# run-batch-test.py
batch_job_name = f"{batch_job_name}"
model_name = f"{batch_model_name}"
payload_size = 1
max_concurrent_tranform_jobs = 100
environment_variables = {}
input_s3_path = "./input.json"
output_s3_path = f"{output_s3_path}"
instance_type = "ml.m5.xlarge"
instance_count = 1

# Create the request dictionary
request = {
    "TransformJobName": batch_job_name,
    "ModelName": model_name,
    "MaxPayloadInMB": payload_size,
    "BatchStrategy": "SingleRecord",
    "MaxConcurrentTransforms": max_concurrent_tranform_jobs,
    "Environment": environment_variables,
    "TransformInput": {
        "DataSource": {
            "S3DataSource": {
                "S3DataType": "S3Prefix",
                "S3Uri": input_s3_path
            }
        },
        "ContentType": "application/json",
        "SplitType": "Line",
        "CompressionType": "None"
    },
    "TransformOutput": {
        'S3OutputPath': output_s3_path,
    },
    "TransformResources": {
        "InstanceType": instance_type,
        "InstanceCount": instance_count
    }
}

# Create the Batch Transform job
try:
    response = sagemaker_client.create_transform_job(**request)
    print("Batch Transform job created:", response['TransformJobArn'])
except Exception as e:
    print("Error creating Batch Transform job:", str(e))
# input.json
{ "key1": "value1", "key2": "value2", "key3": "value3", "key4": "value4", "key5": "value5"}
{ "key1": "value1", "key2": "value2", "key3": "value3", "key4": "value4", "key5": "value5"}

Each JSON is on its own line in the input file. After giving this job a run, it seems to fail with a 500 error: "Bad HTTP status received from algorithm: 500".

Any ideas or issues you see with this?

Kayle
asked a month ago142 views
2 Answers
0

Hi THere

For S3Uri it looks like you are passing the string ./input.json. Have you tried passing the entire S3 URI?

S3Uri Depending on the value specified for the S3DataType, identifies either a key name prefix or a manifest. For example: A key name prefix might look like this: s3://bucketname/exampleprefix/

See https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TransformS3DataSource.html#sagemaker-Type-TransformS3DataSource-S3Uri

profile pictureAWS
EXPERT
Matt-B
answered a month ago
0

Hi @Matt-B

That seems to have fixed it the initial processing but it still fails on parsing the input.

I've changed it to an S3 directory prefix of: s3:://bucketname/inputjsons/. In this directory, I've placed the single input.json file for now.

Once I run the job, it seems to POST to the /invocations endpoint correctly. I can see the logs of it running the model.

However, the overall job fails with a 400 Bad Request Error.

[sagemaker logs]: MaxConcurrentTransforms=100, MaxPayloadInMB=1, BatchStrategy=SINGLE_RECORD
[sagemaker logs]: bucketname/inputjsons/input.json: ClientError: 400
[sagemaker logs]: bucketname/inputjsons/input.json:
[sagemaker logs]: bucketname/inputjsons/input.json: Message:
[sagemaker logs]: bucketname/inputjsons/input.json: <!doctype html>
[sagemaker logs]: bucketname/inputjsons/input.json: <html lang=en>
[sagemaker logs]: bucketname/inputjsons/input.json: <title>400 Bad Request</title>
[sagemaker logs]: bucketname/inputjsons/input.json: <h1>Bad Request</h1>
[sagemaker logs]: bucketname/inputjsons/input.json: <p>The browser (or proxy) sent a request that this server could not understand.</p>
Kayle
answered 25 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions