AWS Sagemaker: UnexpectedStatusException: Compilation job Failed. Reason: ClientError: InputConfiguration: Please make sure input config is correct - Input 1 of node StatefulPartitionedCall was passed

0

I am trying to convert a pre-trained (NASNETMobile) model into AWS Neo Optimized model. I am flowing [https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/] this link. Only difference is that I am using Tensorflow 2.6 and Python 3.7.10. Other dependencies are boto 2.49.0 boto3 1.20.30 botocore 1.23.30 keras 2.7.0 Keras-Preprocessing 1.1.2 numpy 1.19.5 safety 1.10.3 sagemaker 2.74.0 sagemaker-pyspark 1.4.2 tensorflow 2.6.0 tensorflow-cpu 2.6.0 tensorflow-estimator 2.6.0 tensorflow-gpu 2.6.0 tensorflow-io-gcs-filesystem 0.23.1 tensorflow-serving-api 2.6.0

As per link, I got stuck in step 7. Getting error->

from tensorflow.keras.applications import nasnet
from keras.preprocessing import image
import h5py
import numpy as np
from keras.models import load_model

# Load h5 file
loaded_model = load_model('myy_model.h5')

model_version = '2'
export_dir = 'export/Servo/' + model_version

# Save in pb format
loaded_model.save(export_dir)

# Save as tar file
import tarfile
model_archive = 'model.tar.gz'
with tarfile.open(model_archive, mode='w:gz') as archive:
    archive.add('export', recursive=True)

from sagemaker import get_execution_role
from sagemaker import Session

role = get_execution_role()
sess = Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

model_data = sess.upload_data(path=model_archive, key_prefix='model')

from sagemaker.tensorflow import TensorFlowModel

sagemaker_model = TensorFlowModel(model_data=model_data, 
                      framework_version=tf_framework_version,
                      role=role)
predictor = sagemaker_model.deploy(initial_instance_count=1,
                                   instance_type='ml.m4.xlarge')

#Load  jpeg image from local and set target size to 224 x 224
img = image.load_img('eagle.jpg', target_size=(224, 224))

#convert image to array
input_img = image.img_to_array(img)
input_img = np.expand_dims(input_img, axis=0)
input_img = nasnet.preprocess_input(input_img)


predict_img = predictor.predict(input_img)
predict_img

// Till this point every thing is working

instance_family = 'ml_c5'
framework = 'tensorflow'
compilation_job_name = 'keras-compile-16'
# output path for compiled model artifact
compiled_model_path = 's3://{}/{}/output'.format(bucket, compilation_job_name)

data_shape = {'input_2':[1,224,224,3]}

// Failing in below line
optimized_estimator = sagemaker_model.compile(target_instance_family=instance_family,
                                         input_shape=data_shape,
                                         job_name=compilation_job_name,
                                         role=role,
                                         framework=framework,
                                         framework_version=tf_framework_version,
                                         output_path=compiled_model_path
                                        )

data_shape = {'input_2':[1,224,224,3]} "input_2" is right name. have checked in model plus https://netron.app/

Full Error Logs:

UnexpectedStatusException Traceback (most recent call last) <ipython-input-140-c16ae7ba24de> in <module> 5 framework=framework, 6 framework_version=tf_framework_version, ----> 7 output_path=compiled_model_path 8 )

~/anaconda3/envs/tensorflow2_p37/lib/python3.7/site-packages/sagemaker/model.py in compile(self, target_instance_family, input_shape, output_path, role, tags, job_name, compile_max_run, framework, framework_version, target_platform_os, target_platform_arch, target_platform_accelerator, compiler_options) 659 ) 660 self.sagemaker_session.compile_model(**config) --> 661 job_status = self.sagemaker_session.wait_for_compilation_job(job_name) 662 self.model_data = job_status["ModelArtifacts"]["S3ModelArtifacts"] 663 if target_instance_family is not None:

~/anaconda3/envs/tensorflow2_p37/lib/python3.7/site-packages/sagemaker/session.py in wait_for_compilation_job(self, job, poll) 3224 """ 3225 desc = _wait_until(lambda: _compilation_job_status(self.sagemaker_client, job), poll) -> 3226 self._check_job_status(job, desc, "CompilationJobStatus") 3227 return desc 3228

~/anaconda3/envs/tensorflow2_p37/lib/python3.7/site-packages/sagemaker/session.py in _check_job_status(self, job, desc, status_key_name) 3341 ), 3342 allowed_statuses=["Completed", "Stopped"], -> 3343 actual_status=status, 3344 ) 3345

UnexpectedStatusException: Error for Compilation job keras-compile-14: Failed. Reason: ClientError: InputConfiguration: Please make sure input config is correct - Input 1 of node StatefulPartitionedCall was passed float from stem_conv1/kernel:0 incompatible with expected resource.

Thanks in Advance

asked 2 years ago420 views
1 Answer
0

It is hard to say if the input tensor name is wrong without having access to the notebook and the other artifacts but I guess the name of the input tensor might be wrong. TF uses dynamic names for the tensors when you don't specify and the sequence is incremented each time you load the model using the same runtime session. It starts with input_1 then input_2 and so on. Another point is that you're using the estimator to invoke Neo. This library is sometimes outdated. I prefer to use boto3 instead. I wrote a sample code to test the compilation. The following code is working perfectly with your model:

## Load a pre-trained Keras model and export in Tensorflow format
import tensorflow as tf
from tensorflow.keras.applications import nasnet
import numpy as np
model = tf.keras.applications.nasnet.NASNetMobile(weights='imagenet')
x = tf.random.uniform((1, 224, 224, 3))
y = model(x)
export_dir = 'export/1'
tf.saved_model.save(model, export_dir)

## Put the model into a tar ball and send to S3
import sagemaker
import tarfile
import io

model_archive = 'model.tar.gz'
model_name='nasnet-mobile'
img_size=224
sagemaker_session = sagemaker.Session()
default_bucket = sagemaker_session.default_bucket()

with io.BytesIO() as f:
    with tarfile.open(fileobj=f, mode="w:gz") as tar:
        tar.add('export')
        tar.list()
    f.seek(0)
    s3_uri = sagemaker_session.upload_string_as_file_body(f.read(), default_bucket, f"models/{model_name}/model.tar.gz")
    print(s3_uri)

## Now kick-off a compilation job
import time
import boto3
import sagemaker

role = sagemaker.get_execution_role()
sm_client = boto3.client('sagemaker')

framework='tensorflow'
img_size=224
input_shape=f"1,{img_size},{img_size},3"
compilation_job_name = f'{model_name}-{framework}-{int(time.time()*1000)}'

sm_client.create_compilation_job(
    CompilationJobName=compilation_job_name,
    RoleArn=role,
    InputConfig={
        'S3Uri': s3_uri,
        'DataInputConfig': f'{{"{model.layers[0].name}": [{input_shape}]}}',
        'Framework': framework.upper(),
        'FrameworkVersion': '2.4'
    },
    OutputConfig={
        'S3OutputLocation': f's3://{default_bucket}/{model_name}-{framework}/optimized/',
        'TargetDevice': 'ml_c5',
        # Comment or change the following line depending on your edge device
        # Jetson Xavier: sm_72; Jetson Nano: sm_53
        #'CompilerOptions': '{"trt-ver": "7.1.3", "cuda-ver": "10.2", "gpu-code": "sm_72"}' # Jetpack 4.4.1
    },
    StoppingCondition={ 'MaxRuntimeInSeconds': 18000 }
)
while True:
    resp = sm_client.describe_compilation_job(CompilationJobName=compilation_job_name)    
    if resp['CompilationJobStatus'] in ['STARTING', 'INPROGRESS']:
        print('Running...')
    else:
        print(resp['CompilationJobStatus'], compilation_job_name)
        break
    time.sleep(5)

Just notice that I'm using FrameworkVersion = 2.4. There is not information about that in your sample.

AWS
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions