tensorflow.neuron only compiles a small subset of all operations

0

We are using tensorflow.neuron to compile a tensorflow 1.x SavedModel to run on AWS Inferentia machines on EC2. We do this by calling: tensorflow.neuron.saved_model.compile(model_dir, compiled_model_dir)

36 subgraphs are compiled successfully while two produces warnings:

WARNING:tensorflow:Failed to fuse subgraph {subgraph neuron_op_cd169bb88475d5d0 with input tensors ["<tf.Tensor 'encoder/c0/_52:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/c_70/_53:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/h0/_54:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/c_40/_55:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/h_10/_56:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/c_20/_57:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/c_50/_58:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/c_10/_59:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/c_60/_60:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/h_60/_61:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/h_50/_62:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/h_30/_63:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'downsampler2/3/add0/_64:0' shape=(1, 16, 176, 128) dtype=float32>", "<tf.Tensor 'downsampler2/3/elu/Elu0/_65:0' shape=(1, 16, 176, 128) dtype=float32>", "<tf.Tensor 'encoder/c_30/_66:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/h_40/_67:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/h_70/_68:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/h_20/_69:0' shape=(1, 88, 128) dtype=float32>"], output tensors ["<tf.Tensor 'encoder/stack:0' shape=(1, 16, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_0/cell_0/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_0/cell_0/conv_lstm_cell/mul_127:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_1/cell_1/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_1/cell_1/conv_lstm_cell/mul_127:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_2/cell_2/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_2/cell_2/conv_lstm_cell/mul_127:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_3/cell_3/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_3/cell_3/conv_lstm_cell/mul_127:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_4/cell_4/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_4/cell_4/conv_lstm_cell/mul_127:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_5/cell_5/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_5/cell_5/conv_lstm_cell/mul_127:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_6/cell_6/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_6/cell_6/conv_lstm_cell/mul_127:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_7/cell_7/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0' shape=(1, 88, 128) dtype=float32>", "<tf.Tensor 'encoder/encoder/multi_rnn_cell/cell_7/cell_7/conv_lstm_cell/mul_127:0' shape=(1, 88, 128) dtype=float32>"]} with '/home/ec2-user/tensorflow_venv/bin/neuron-cc compile /tmp/tmpqa4twpbj/neuron_op_cd169bb88475d5d0/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpqa4twpbj/neuron_op_cd169bb88475d5d0/graph_def.neff --io-config "{\"inputs\": {\"encoder/c0/_52:0\": [[1, 88, 128], \"float32\"], \"encoder/c_70/_53:0\": [[1, 88, 128], \"float32\"], \"encoder/h0/_54:0\": [[1, 88, 128], \"float32\"], \"encoder/c_40/_55:0\": [[1, 88, 128], \"float32\"], \"encoder/h_10/_56:0\": [[1, 88, 128], \"float32\"], \"encoder/c_20/_57:0\": [[1, 88, 128], \"float32\"], \"encoder/c_50/_58:0\": [[1, 88, 128], \"float32\"], \"encoder/c_10/_59:0\": [[1, 88, 128], \"float32\"], \"encoder/c_60/_60:0\": [[1, 88, 128], \"float32\"], \"encoder/h_60/_61:0\": [[1, 88, 128], \"float32\"], \"encoder/h_50/_62:0\": [[1, 88, 128], \"float32\"], \"encoder/h_30/_63:0\": [[1, 88, 128], \"float32\"], \"downsampler2/3/add0/_64:0\": [[1, 16, 176, 128], \"float32\"], \"downsampler2/3/elu/Elu0/_65:0\": [[1, 16, 176, 128], \"float32\"], \"encoder/c_30/_66:0\": [[1, 88, 128], \"float32\"], \"encoder/h_40/_67:0\": [[1, 88, 128], \"float32\"], \"encoder/h_70/_68:0\": [[1, 88, 128], \"float32\"], \"encoder/h_20/_69:0\": [[1, 88, 128], \"float32\"]}, \"outputs\": [\"encoder/stack:0\", \"encoder/encoder/multi_rnn_cell/cell_0/cell_0/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0\", \"encoder/encoder/multi_rnn_cell/cell_0/cell_0/conv_lstm_cell/mul_127:0\", \"encoder/encoder/multi_rnn_cell/cell_1/cell_1/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0\", \"encoder/encoder/multi_rnn_cell/cell_1/cell_1/conv_lstm_cell/mul_127:0\", \"encoder/encoder/multi_rnn_cell/cell_2/cell_2/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0\", \"encoder/encoder/multi_rnn_cell/cell_2/cell_2/conv_lstm_cell/mul_127:0\", \"encoder/encoder/multi_rnn_cell/cell_3/cell_3/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0\", \"encoder/encoder/multi_rnn_cell/cell_3/cell_3/conv_lstm_cell/mul_127:0\", \"encoder/encoder/multi_rnn_cell/cell_4/cell_4/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0\", \"encoder/encoder/multi_rnn_cell/cell_4/cell_4/conv_lstm_cell/mul_127:0\", \"encoder/encoder/multi_rnn_cell/cell_5/cell_5/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0\", \"encoder/encoder/multi_rnn_cell/cell_5/cell_5/conv_lstm_cell/mul_127:0\", \"encoder/encoder/multi_rnn_cell/cell_6/cell_6/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0\", \"encoder/encoder/multi_rnn_cell/cell_6/cell_6/conv_lstm_cell/mul_127:0\", \"encoder/encoder/multi_rnn_cell/cell_7/cell_7/conv_lstm_cell/LayerNorm_79/batchnorm/add_1:0\", \"encoder/encoder/multi_rnn_cell/cell_7/cell_7/conv_lstm_cell/mul_127:0\"]}" --neuroncore-pipeline-cores 4'
WARNING:tensorflow:Failed to fuse subgraph {subgraph neuron_op_1d250d6fb9ac33f with input tensors ["<tf.Tensor 'upsampler1/GatherV20/_77:0' shape=(1, 32, 88, 128) dtype=float32>"], output tensors ["<tf.Tensor 'upsampler1/LayerNorm/batchnorm/add_1:0' shape=(1, 32, 88, 128) dtype=float32>", "<tf.Tensor 'upsampler1/0/downsample/BiasAdd:0' shape=(1, 32, 88, 32) dtype=float32>"]} with '/home/ec2-user/tensorflow_venv/bin/neuron-cc compile /tmp/tmpqa4twpbj/neuron_op_1d250d6fb9ac33f/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpqa4twpbj/neuron_op_1d250d6fb9ac33f/graph_def.neff --io-config "{\"inputs\": {\"upsampler1/GatherV20/_77:0\": [[1, 32, 88, 128], \"float32\"]}, \"outputs\": [\"upsampler1/LayerNorm/batchnorm/add_1:0\", \"upsampler1/0/downsample/BiasAdd:0\"]}" --neuroncore-pipeline-cores 4'

These two subgraphs take around five hours to compile and use an immense amount of RAM in the process (80-100 GB). This doesn't make much sense since the model itself is a fairly small ConvLSTM of around 62 MB (17 MB when optimized for inference). The compilation does however finish but when testing the resulting model on Inf1 the performance is terrible, more than twice as slow as the same model on Amazon EI.

Looking closer at the logs I noticed that only a small subset of operations are actually placed on neuron:

2024-04-03 09:07:54.612207: I tensorflow/neuron/grappler/convert/segment.cc:456] There are 56 ops of 4 different types in the graph that are not compiled by neuron-cc: GatherV2, Elu, NoOp, Placeholder, (For more information see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.html).
2024-04-03 09:08:24.373061: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: graph_to_optimize
2024-04-03 09:08:24.373111: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   aws_neuron_static_shape_inference: Graph size after: 13757 nodes (0), 19247 edges (0), time = 242.094ms.
2024-04-03 09:08:24.373117: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   aws_neuron_fuse_supported_operators: Graph size after: 111 nodes (-13646), 158 edges (-19089), time = 29833.4043ms.
INFO:tensorflow:Number of operations in TensorFlow session: 13757
INFO:tensorflow:Number of operations after tf.neuron optimizations: 13758
INFO:tensorflow:Number of operations placed on Neuron runtime: 487

As you can see, there's 56 unsupported operations in the graph which seems insignificant in the total of 13757 operations, but then only 487 operations are placed on Neuron runtime? Why is that?

We also compiled an unoptimized version of the model for 1 and 4 neuron cores with similar results:

Unoptimized 4x neuron cores:

INFO:tensorflow:Number of operations in TensorFlow session: 187829
INFO:tensorflow:Number of operations after tf.neuron optimizations: 13967
INFO:tensorflow:Number of operations placed on Neuron runtime: 578

Unoptimized 1x neuron cores:

INFO:tensorflow:Number of operations in TensorFlow session: 187829
INFO:tensorflow:Number of operations after tf.neuron optimizations: 13967
INFO:tensorflow:Number of operations placed on Neuron runtime: 602

Why are so few operations making it on the neuron runtime?

Regards,

Patrik Ohlsson Doremir Music Research

patriko
posta 4 mesi fa462 visualizzazioni
3 Risposte
1
Risposta accettata

Hello,

Greetings of the day.

Thank you for using AWS services.

To better address this issue, we require more details that are non-public information i.e details that are specific to your AWS account. Thus, I request you to kindly open a support case with AWS using the following link https://console.aws.amazon.com/support/home#/case/create

Thank you and have a nice day.

AWS
TECNICO DI SUPPORTO
con risposta 4 mesi fa
1

Unfortunately, we would need to be analyzing the unsupported operations in the graph, optimizing the model architecture for better compatibility with Neuron, and ensuring that the compilation process is properly configured. Which is a lot of work !!!

Additionally, consulting TensorFlow Neuron's documentation and seeking support from AWS or the TensorFlow community may provide valuable insights and guidance for resolving the issue.

profile picture
ESPERTO
con risposta 4 mesi fa
  • Thank you for your answer! We've submitted a support ticket and will hopefully be able to provide the information needed to solve this issue.

0

I suggest try compiling your model on an interentia2 / trn1 instance. However, this may require moving your model from TF 1.x to TF 2.x

AWS
con risposta 4 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande