How can we use a JSON input for AWS Sagemaker Batch Transform Job

0

It's quite unclear how we can use a JSON input file for AWS Sagemaker Batch Transform Job. Most of the documentation I have reviewed provides examples for CSV.

I did come across some scattered forum threads online and have managed to come up with:

# run-batch-test.py
batch_job_name = f"{batch_job_name}"
model_name = f"{batch_model_name}"
payload_size = 1
max_concurrent_tranform_jobs = 100
environment_variables = {}
input_s3_path = "./input.json"
output_s3_path = f"{output_s3_path}"
instance_type = "ml.m5.xlarge"
instance_count = 1

# Create the request dictionary
request = {
    "TransformJobName": batch_job_name,
    "ModelName": model_name,
    "MaxPayloadInMB": payload_size,
    "BatchStrategy": "SingleRecord",
    "MaxConcurrentTransforms": max_concurrent_tranform_jobs,
    "Environment": environment_variables,
    "TransformInput": {
        "DataSource": {
            "S3DataSource": {
                "S3DataType": "S3Prefix",
                "S3Uri": input_s3_path
            }
        },
        "ContentType": "application/json",
        "SplitType": "Line",
        "CompressionType": "None"
    },
    "TransformOutput": {
        'S3OutputPath': output_s3_path,
    },
    "TransformResources": {
        "InstanceType": instance_type,
        "InstanceCount": instance_count
    }
}

# Create the Batch Transform job
try:
    response = sagemaker_client.create_transform_job(**request)
    print("Batch Transform job created:", response['TransformJobArn'])
except Exception as e:
    print("Error creating Batch Transform job:", str(e))
# input.json
{ "key1": "value1", "key2": "value2", "key3": "value3", "key4": "value4", "key5": "value5"}
{ "key1": "value1", "key2": "value2", "key3": "value3", "key4": "value4", "key5": "value5"}

Each JSON is on its own line in the input file. After giving this job a run, it seems to fail with a 500 error: "Bad HTTP status received from algorithm: 500".

Any ideas or issues you see with this?

Kayle
gefragt vor 2 Monaten163 Aufrufe
2 Antworten
0

Hi THere

For S3Uri it looks like you are passing the string ./input.json. Have you tried passing the entire S3 URI?

S3Uri Depending on the value specified for the S3DataType, identifies either a key name prefix or a manifest. For example: A key name prefix might look like this: s3://bucketname/exampleprefix/

See https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TransformS3DataSource.html#sagemaker-Type-TransformS3DataSource-S3Uri

profile pictureAWS
EXPERTE
Matt-B
beantwortet vor 2 Monaten
0

Hi @Matt-B

That seems to have fixed it the initial processing but it still fails on parsing the input.

I've changed it to an S3 directory prefix of: s3:://bucketname/inputjsons/. In this directory, I've placed the single input.json file for now.

Once I run the job, it seems to POST to the /invocations endpoint correctly. I can see the logs of it running the model.

However, the overall job fails with a 400 Bad Request Error.

[sagemaker logs]: MaxConcurrentTransforms=100, MaxPayloadInMB=1, BatchStrategy=SINGLE_RECORD
[sagemaker logs]: bucketname/inputjsons/input.json: ClientError: 400
[sagemaker logs]: bucketname/inputjsons/input.json:
[sagemaker logs]: bucketname/inputjsons/input.json: Message:
[sagemaker logs]: bucketname/inputjsons/input.json: <!doctype html>
[sagemaker logs]: bucketname/inputjsons/input.json: <html lang=en>
[sagemaker logs]: bucketname/inputjsons/input.json: <title>400 Bad Request</title>
[sagemaker logs]: bucketname/inputjsons/input.json: <h1>Bad Request</h1>
[sagemaker logs]: bucketname/inputjsons/input.json: <p>The browser (or proxy) sent a request that this server could not understand.</p>
Kayle
beantwortet vor 2 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen