How to run the tensorflow neuron in SageMaker endpoint for production

Question

We have a huggingfacemodel with zero-shot-classification with neuron infernetia. It's based on [the pretrained huggingface pipelines distilBert with TensorFlow2 neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/huggingface_bert/huggingface_bert.html) with zero-shot-classification model.

We planned to use it in production environment since it reduced the latency from 1s to 100ms.

However, the Sagemaker Python SDK HuggingFaceModel seems not support tensorflow 2 neuron. It gave error like bellow.

My question is that how to run this tensorflow2 neuron under sageamaker?
1. If huggingfacemodel doesn't support tensorflow2, can you provide a pytorch version for hugginface pipeline. There isn't any example of implement neuron for huggingface pipeline. 
2. Is there any other way like create dockerfile?
Thanks a lot

```
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data="s3://sagemaker-us-west-2-**********/inf1/model.tar.gz",      # path to your model and script
   role=role,                    # iam role with permissions to create an Endpoint
   transformers_version="4.6.1",  # transformers version used
   tensorflow_version="2.4.1",        # pytorch version used
   py_version='py37',            # python version used
)
huggingface_model._is_compiled_model = True
# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,      # number of instances
    instance_type="ml.inf6.xlarge" # AWS Inferentia Instance
)
```
We got response

```
Defaulting to the only supported framework/algorithm version: 4.12.3. Ignoring framework/algorithm version: 4.6.1.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_18249/2024686093.py in 
      2 predictor = huggingface_model.deploy(
      3     initial_instance_count=1,      # number of instances
----> 4     instance_type="ml.inf1.xlarge" # AWS Inferentia Instance
      5 )

~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/sagemaker/huggingface/model.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, async_inference_config, serverless_inference_config, volume_size, model_data_download_timeout, container_startup_health_check_timeout, inference_recommendation_id, **kwargs)
    303 
    304         return super(HuggingFaceModel, self).deploy(
--> 305             initial_instance_count,
    306             instance_type,
    307             serializer,

~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/sagemaker/huggingface/model.py in serving_image_uri(self, region_name, instance_type, accelerator_type, serverless_inference_config)

~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/sagemaker/workflow/utilities.py in wrapper(*args, **kwargs)
    386 
    387 
--> 388 def execute_job_functions(step_args: _StepArguments):
    389     """Execute the job class functions during pipeline definition construction
    390

~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/sagemaker/image_uris.py in retrieve(framework, region, version, py_version, instance_type, accelerator_type, image_scope, container_version, distribution, base_framework_version, training_compiler_config, model_id, model_version, tolerate_vulnerable_model, tolerate_deprecated_model, sdk_version, inference_tool, serverless_inference_config)
    172             )
    173         _validate_arg(full_base_framework_version, list(version_config.keys()), "base framework")
--> 174         version_config = version_config.get(full_base_framework_version)
    175 
    176     py_version = _validate_py_version_and_set_if_needed(py_version, version_config, framework)

~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/sagemaker/image_uris.py in _validate_arg(arg, available_options, arg_name)
    569     """Creates a tag for the image URI."""
    570     if inference_tool:
--> 571         return "-".join(x for x in (tag_prefix, inference_tool, py_version, container_version) if x)
    572     return "-".join(x for x in (tag_prefix, processor, py_version, container_version) if x)
    573

ValueError: Unsupported base framework: tensorflow2.4.1. You may need to upgrade your SDK version (pip install -U sagemaker) for newer base frameworks. Supported base framework(s): version_aliases, pytorch1.9.1.
```

Answer

Hi Xin Tong, 
Thanks for posting the question. HuggingFace Neuron Inference Containers are currently only available for PyTorch.  Please file a feature request on https://github.com/aws/deep-learning-containers for TensorFlow 2.x HuggingFace Neuron Inference Container support.

How to run the tensorflow neuron in SageMaker endpoint for production

Contenuto pertinente