- Newest
- Most votes
- Most comments
You're seeing this because of your MultiRecord
batch strategy: SageMaker is aware of how to split your source data files into individual records (because you configured SplitType
), but is composing batches with multiple records and trying to send those through to your model/endpoint. It seems like your inference input handler is not capable of interpreting JSONLines chunks, only single JSON objects.
One way of fixing this would be to switch to SingleRecord
batch strategy, which would result in each record triggering a separate inference request to your model.
If you're concerned about the HTTP overhead of request-per-record limiting your job performance, you could alternatively stick with MultiRecord
but edit your input_fn
to handle JSONLines data. I'd probably suggest to set a different ContentType
to explicitly signal your container when to expect JSONLines vs single-record JSON. Your input_fn can detect that different request_content_type
(e.g. application/x-jsonlines
) and use a different parsing logic.
I'm not 100% sure whether the request_body
supports iterating through lines like a file would ([json.loads(l) for l in request_body]
), whether you could treat it like a string ([json.loads(l) for l in request_body.split("\n")]
), or perhaps it's a binary string you'd need to decode first e.g. request_body.decode("utf-8").split("\n")
... Would need to check - but something along these lines should allow you to first split your body by newlines, then parse each line as a valid JSON object.
Relevant content
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
@Alex_T - thanks. this helps a lot, but i'm not clear on one thing on the order of these things being executed. based on your comments , in my input_fn method , if i read each json lines as such , for example , input_data= json.loads(l) for l in request_body. once this iterates through all the json object , and i assume the predict_fn is called once for each json line or it is called once and then the model processes them one at a time or if there are multiple gpu/cpu, it might process them in parallel?
@clouduser - Just saw this. Your functions will be called once per HTTP request/response: So if
input_fn
decodes anapplication/x-jsonlines
request into a batch of multiple records, yourpredict_fn
will receive this whole batch. If you'd like to split HTTP request batches into smaller mini-batches for inference (e.g. to prevent memory issues), then you could do that within thepredict_fn
. Yourpredict_fn
would then concat all the results back together before returning, andoutput_fn
would serialize the response (back to JSONLines?)