Hi,
Currently I'm trying to change framework the model is prepared from Tensorflow to Pytorch.
The issue I encounter is long time of request decoding (request with JSON to dict) - it takes around 200 ms to just convert reguest to dict (with json.loads)
def input_fn(input_data, content_type):
"""Placeholder docstring"""
time_start = time()
input_data = json.loads(input_data)
logger.info(f"Input serializer (input_fn) in {round(time() - time_start, 3)} seconds.")
return input_data
The issue didn't exist when Tensorflow framework was used (with the same request whole model inference took around 100 ms).
I was trying with different library type (msgspec) but the decoding time was very similar.
Could you advise some solution to decode JSON faster (somehow it works with tensorflow) so I guess there is a way.
Changing the request body type was considered, but with limited access to sender, it's cumbersome.
Deployment script:
model_container_image = r"763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1.0-cpu-py310-ubuntu20.04-sagemaker-v1.2"
model_builder = ModelBuilder(
model_path=model_path,
schema_builder=SchemaBuilder(sample_input, sample_output),
mode=Mode.SAGEMAKER_ENDPOINT,
content_type='application/json',
accept_type='application/json',
role_arn='zzz',
image_uri=model_container_image,
inference_spec=YoloX(),
log_level=logging.DEBUG
)
built_model = model_builder.build()
built_model.deploy(
instance_type="ml.m5.large",
endpoint_name=zzz',
initial_instance_count=1,
endpoint_logging=True)
Yes, I've tried - msgspec, but the improvement was insufficient. Do you know if there is a way to pars json with torch serve like it is in tensorflow serving?