Skip to content

Tensorflow trained model endpoint deployment

0

Hi

I trained a tensorflow model in Sagemaker AI using the following with the custom script

estimator = TensorFlow( entry_point="sklearn_model_select.py", role=role, instance_count=1, instance_type="ml.m5.2xlarge", sagemaker_session=sagemaker_session, base_job_name='NN-consumer-attrition', max_run=32000, output_path=training_output_path, framework_version='2.12', py_version='py310', hyperparameters={"verbose":0, "epochs":100, "batch_size":64, 'neurons':128, 'layers':1} ) estimator.fit({"train":train_path, "test": test_path}, wait=True)

sagemaker_tensorflow_container.training WARNING Your model will NOT be servable with SageMaker TensorFlow Serving container. The model artifact was not saved in the TensorFlow SavedModel directory structure: https://www.tensorflow.org/guide/saved_model#structure_of_a_savedmodel_directory

In my custom script, I save the model using KerasClassifier, so my model is saved by

path = os.path.join(args.model_dir, "model.h5") tf.keras.models.save_model(model.model, path)

I tried the using the following codes to deploy model as endpoint

model_artifacts_url = estimator.model_data script_path = "sklearn_model_select.py" sagemaker_model = TensorFlowModel( model_data=model_artifacts_url, entry_point=script_path, role=role, framework_version="2.12", )

predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type="ml.m5.2xlarge", endpoint_name=endpoint_name)

it failed to due to "no SavedModel bundles found!"

So TensorFlow container actually didn't save the model in model_data, right?

Please give me some ideas what next steps I should follow to save the model as endpoint? Do I have to manually create .gz file with model.h5 saved? Is it an easier way to achieve this?

Thanks in advance.

  • If my answer helped solve your problem, I would appreciate it if you click on “accepted answer”

asked 2 months ago57 views
2 Answers
3

In addition, to fix the "no SavedModel bundles found!" error without complex workarounds, here is the essential technical correction for your SageMaker setup:

The Root Cause

SageMaker’s TensorFlow Serving container is hardcoded to look for a versioned directory containing a saved_model.pb file. Your current script saves a single .h5 file, which the container cannot "see" or load automatically.

The Essential Fix: Update your Training Script

Modify your sklearn_model_select.py to save in the Protobuf (TF) format inside a numbered folder. Replace your save logic with this:

# Create a versioned subdirectory (e.g., '1') inside model_dir
export_path = os.path.join(args.model_dir, "1")

# Save in TensorFlow SavedModel format (creates .pb and variables folder)
model.model.save(export_path, save_format='tf')

If you cannot re-train (The model_fn workaround)

If re-training is not an option, you must tell SageMaker how to load the .h5 file manually. Create an inference.py and add this function:

import os
import tensorflow as tf

def model_fn(model_dir):
    """Explicitly loads the H5 file from the extracted model_dir"""
    return tf.keras.models.load_model(os.path.join(model_dir, "model.h5"))

Note: You must pass this script as the entry_point in your TensorFlowModel object.

Recommendation: Use the Training Script update. It is the most robust method and ensures the best performance on SageMaker endpoints.

EXPERT
answered 2 months ago
0

The issue you're encountering is that your model is being saved in the H5 format (model.h5), but the SageMaker TensorFlow Serving container expects models to be saved in the TensorFlow SavedModel format, which has a specific directory structure.

To resolve this and successfully deploy your model as an endpoint, you have a few options:

Option 1: Save in SavedModel Format (Recommended) Modify your custom training script to save the model in the SavedModel format instead of H5. Replace your current save code with:

path = os.path.join(args.model_dir, "1")  # Note: "1" represents the model version
tf.keras.models.save_model(model.model, path, save_format='tf')

This will create the proper SavedModel directory structure that SageMaker expects, and your deployment should work without additional changes.

Option 2: Deploy from Existing Model Artifacts If you've already trained your model and have the H5 file, you can deploy it directly from the model artifacts. You'll need to ensure your inference script (entry_point) includes the proper model loading logic to handle the H5 format. In your inference script, you would load the model using:

model = tf.keras.models.load_model(os.path.join(model_dir, 'model.h5'))

The SavedModel format is the preferred approach as it's optimized for serving and is the standard format expected by TensorFlow Serving containers. This will ensure compatibility with SageMaker's hosting services and provide better performance for inference.

After making these changes, your existing deployment code using TensorFlowModel and the deploy() method should work correctly to create your endpoint.
Sources
Resources for using TensorFlow with Amazon SageMaker AI - Amazon SageMaker AI

answered 2 months ago
EXPERT
reviewed 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.