AWS Sagemaker: UnexpectedStatusException: Compilation job Failed. Reason: ClientError: InputConfiguration: Please make sure input config is correct - Input 1 of node StatefulPartitionedCall was passed

Question

I am trying to convert a pre-trained (NASNETMobile) model into AWS Neo Optimized model. 
I am flowing [https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/] this link. Only difference is that I am using Tensorflow 2.6 and Python 3.7.10.  Other dependencies are 
boto                               2.49.0
boto3                              1.20.30
botocore                           1.23.30
keras                              2.7.0
Keras-Preprocessing                1.1.2
numpy                              1.19.5
safety                             1.10.3
sagemaker                          2.74.0
sagemaker-pyspark                  1.4.2
tensorflow                         2.6.0
tensorflow-cpu                     2.6.0
tensorflow-estimator               2.6.0
tensorflow-gpu                     2.6.0
tensorflow-io-gcs-filesystem       0.23.1
tensorflow-serving-api             2.6.0

As per link, I got stuck in step 7. Getting error->
```
from tensorflow.keras.applications import nasnet
from keras.preprocessing import image
import h5py
import numpy as np
from keras.models import load_model

# Load h5 file
loaded_model = load_model('myy_model.h5')

model_version = '2'
export_dir = 'export/Servo/' + model_version

# Save in pb format
loaded_model.save(export_dir)

# Save as tar file
import tarfile
model_archive = 'model.tar.gz'
with tarfile.open(model_archive, mode='w:gz') as archive:
    archive.add('export', recursive=True)

from sagemaker import get_execution_role
from sagemaker import Session

role = get_execution_role()
sess = Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

model_data = sess.upload_data(path=model_archive, key_prefix='model')

from sagemaker.tensorflow import TensorFlowModel

sagemaker_model = TensorFlowModel(model_data=model_data, 
                      framework_version=tf_framework_version,
                      role=role)
predictor = sagemaker_model.deploy(initial_instance_count=1,
                                   instance_type='ml.m4.xlarge')

#Load  jpeg image from local and set target size to 224 x 224
img = image.load_img('eagle.jpg', target_size=(224, 224))

#convert image to array
input_img = image.img_to_array(img)
input_img = np.expand_dims(input_img, axis=0)
input_img = nasnet.preprocess_input(input_img)

predict_img = predictor.predict(input_img)
predict_img

// Till this point every thing is working

instance_family = 'ml_c5'
framework = 'tensorflow'
compilation_job_name = 'keras-compile-16'
# output path for compiled model artifact
compiled_model_path = 's3://{}/{}/output'.format(bucket, compilation_job_name)

data_shape = {'input_2':[1,224,224,3]}

// Failing in below line
optimized_estimator = sagemaker_model.compile(target_instance_family=instance_family,
                                         input_shape=data_shape,
                                         job_name=compilation_job_name,
                                         role=role,
                                         framework=framework,
                                         framework_version=tf_framework_version,
                                         output_path=compiled_model_path
                                        )

```

data_shape = {'input_2':[1,224,224,3]}
"input_2" is right name. have checked in model plus [https://netron.app/]()

Full Error Logs:

UnexpectedStatusException                 Traceback (most recent call last)
 in 
      5                                          framework=framework,
      6                                          framework_version=tf_framework_version,
----> 7                                          output_path=compiled_model_path
      8                                         )

~/anaconda3/envs/tensorflow2_p37/lib/python3.7/site-packages/sagemaker/model.py in compile(self, target_instance_family, input_shape, output_path, role, tags, job_name, compile_max_run, framework, framework_version, target_platform_os, target_platform_arch, target_platform_accelerator, compiler_options)
    659         )
    660         self.sagemaker_session.compile_model(**config)
--> 661         job_status = self.sagemaker_session.wait_for_compilation_job(job_name)
    662         self.model_data = job_status["ModelArtifacts"]["S3ModelArtifacts"]
    663         if target_instance_family is not None:

~/anaconda3/envs/tensorflow2_p37/lib/python3.7/site-packages/sagemaker/session.py in wait_for_compilation_job(self, job, poll)
   3224         """
   3225         desc = _wait_until(lambda: _compilation_job_status(self.sagemaker_client, job), poll)
-> 3226         self._check_job_status(job, desc, "CompilationJobStatus")
   3227         return desc
   3228

~/anaconda3/envs/tensorflow2_p37/lib/python3.7/site-packages/sagemaker/session.py in _check_job_status(self, job, desc, status_key_name)
   3341                 ),
   3342                 allowed_statuses=["Completed", "Stopped"],
-> 3343                 actual_status=status,
   3344             )
   3345

UnexpectedStatusException: Error for Compilation job keras-compile-14: Failed. Reason: ClientError: InputConfiguration: Please make sure input config is correct - Input 1 of node StatefulPartitionedCall was passed float from stem_conv1/kernel:0 incompatible with expected resource.

Thanks in Advance

Answer

It is hard to say if the input tensor name is wrong without having access to the notebook and the other artifacts but I guess the name of the input tensor might be wrong. TF uses dynamic names for the tensors when you don't specify and the sequence is incremented each time you load the model using the same runtime session. It starts with input_1 then input_2 and so on. Another point is that you're using the estimator to invoke Neo. This library is sometimes outdated. I prefer to use boto3 instead. I wrote a sample code to test the compilation. The following code is working perfectly with your model:

```
## Load a pre-trained Keras model and export in Tensorflow format
import tensorflow as tf
from tensorflow.keras.applications import nasnet
import numpy as np
model = tf.keras.applications.nasnet.NASNetMobile(weights='imagenet')
x = tf.random.uniform((1, 224, 224, 3))
y = model(x)
export_dir = 'export/1'
tf.saved_model.save(model, export_dir)

## Put the model into a tar ball and send to S3
import sagemaker
import tarfile
import io

model_archive = 'model.tar.gz'
model_name='nasnet-mobile'
img_size=224
sagemaker_session = sagemaker.Session()
default_bucket = sagemaker_session.default_bucket()

with io.BytesIO() as f:
    with tarfile.open(fileobj=f, mode="w:gz") as tar:
        tar.add('export')
        tar.list()
    f.seek(0)
    s3_uri = sagemaker_session.upload_string_as_file_body(f.read(), default_bucket, f"models/{model_name}/model.tar.gz")
    print(s3_uri)

## Now kick-off a compilation job
import time
import boto3
import sagemaker

role = sagemaker.get_execution_role()
sm_client = boto3.client('sagemaker')

framework='tensorflow'
img_size=224
input_shape=f"1,{img_size},{img_size},3"
compilation_job_name = f'{model_name}-{framework}-{int(time.time()*1000)}'

sm_client.create_compilation_job(
    CompilationJobName=compilation_job_name,
    RoleArn=role,
    InputConfig={
        'S3Uri': s3_uri,
        'DataInputConfig': f'{{"{model.layers[0].name}": [{input_shape}]}}',
        'Framework': framework.upper(),
        'FrameworkVersion': '2.4'
    },
    OutputConfig={
        'S3OutputLocation': f's3://{default_bucket}/{model_name}-{framework}/optimized/',
        'TargetDevice': 'ml_c5',
        # Comment or change the following line depending on your edge device
        # Jetson Xavier: sm_72; Jetson Nano: sm_53
        #'CompilerOptions': '{"trt-ver": "7.1.3", "cuda-ver": "10.2", "gpu-code": "sm_72"}' # Jetpack 4.4.1
    },
    StoppingCondition={ 'MaxRuntimeInSeconds': 18000 }
)
while True:
    resp = sm_client.describe_compilation_job(CompilationJobName=compilation_job_name)    
    if resp['CompilationJobStatus'] in ['STARTING', 'INPROGRESS']:
        print('Running...')
    else:
        print(resp['CompilationJobStatus'], compilation_job_name)
        break
    time.sleep(5)
```

Just notice that I'm using FrameworkVersion = 2.4. There is not information about that in your sample.

AWS Sagemaker: UnexpectedStatusException: Compilation job Failed. Reason: ClientError: InputConfiguration: Please make sure input config is correct - Input 1 of node StatefulPartitionedCall was passed

Relevant content