How to get batch transform with jsonl data?

0

I am using my own inference.py file as a entry point for inference. I have tested this pytorch model, served as a real time endpoint in amaon sagemaker. but when i try to create a batch job and use multiple json object in my input file (jsonl format) . i get the following error at the input_fn function on this line data = json.loads(request_body), in cloudwatch logs ==>

data = json.loads(request_body) raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data : line 2 column 1 (Char ..)

I am not sure why am i getting extra data on line 2 error, because this is supposed to be batch job with multiple json input and each line.

inference.py

def model_fn(model_dir):
   //load the model




def input_fn(request_body, request_content_type):
    input_data= json.loads(request_body)
    return data

def predict_fn(input_data, model):
    return model.predict(input_data)

set up batch job

response = client.create_transform_job(
    TransformJobName='some-job',
    ModelName='mypytorchmodel',
    ModelClientConfig={
        'InvocationsTimeoutInSeconds': 3600,
        'InvocationsMaxRetries': 1
    },
    BatchStrategy='MultiRecord',
    TransformInput={
        'DataSource': {
            'S3DataSource': {
                'S3DataType': 'S3Prefix',
                'S3Uri': 's3://inputpath'
            }
        },
        'ContentType': 'application/json',
        'SplitType': 'Line'
    },
    TransformOutput={
        'S3OutputPath': 's3://outputpath',
        'Accept': 'application/json',
        'AssembleWith': 'Line',
    },
    TransformResources={
        'InstanceType': 'ml.g4dn.xlarge'
        'InstanceCount': 1
    }
)

input file

{"input" : "some text here"}
{"input" : "another"}
...
1 Answer
1

You're seeing this because of your MultiRecord batch strategy: SageMaker is aware of how to split your source data files into individual records (because you configured SplitType), but is composing batches with multiple records and trying to send those through to your model/endpoint. It seems like your inference input handler is not capable of interpreting JSONLines chunks, only single JSON objects.

One way of fixing this would be to switch to SingleRecord batch strategy, which would result in each record triggering a separate inference request to your model.

If you're concerned about the HTTP overhead of request-per-record limiting your job performance, you could alternatively stick with MultiRecord but edit your input_fn to handle JSONLines data. I'd probably suggest to set a different ContentType to explicitly signal your container when to expect JSONLines vs single-record JSON. Your input_fn can detect that different request_content_type (e.g. application/x-jsonlines) and use a different parsing logic.

I'm not 100% sure whether the request_body supports iterating through lines like a file would ([json.loads(l) for l in request_body]), whether you could treat it like a string ([json.loads(l) for l in request_body.split("\n")]), or perhaps it's a binary string you'd need to decode first e.g. request_body.decode("utf-8").split("\n")... Would need to check - but something along these lines should allow you to first split your body by newlines, then parse each line as a valid JSON object.

AWS
EXPERT
Alex_T
answered 2 years ago
  • @Alex_T - thanks. this helps a lot, but i'm not clear on one thing on the order of these things being executed. based on your comments , in my input_fn method , if i read each json lines as such , for example , input_data= json.loads(l) for l in request_body. once this iterates through all the json object , and i assume the predict_fn is called once for each json line or it is called once and then the model processes them one at a time or if there are multiple gpu/cpu, it might process them in parallel?

  • @clouduser - Just saw this. Your functions will be called once per HTTP request/response: So if input_fn decodes an application/x-jsonlines request into a batch of multiple records, your predict_fn will receive this whole batch. If you'd like to split HTTP request batches into smaller mini-batches for inference (e.g. to prevent memory issues), then you could do that within the predict_fn. Your predict_fn would then concat all the results back together before returning, and output_fn would serialize the response (back to JSONLines?)

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions