- Newest
- Most votes
- Most comments
The "--fast-math=none" option you are using is actually internally casting matrix multiply operations to fp16 and is maybe the best option for you already. Some precision loss is expected due to the use of lower precision datatype. However, over a dataset such as MRPC, we see the same BERT accuracy as GPU/CPU. If you see otherwise, please file a ticket via https://github.com/aws-neuron/aws-neuron-sdk/issues or send an email to aws-neuron-support@amazon.com .
Sample-to-sample variation is expected since CPU architecture is different from Inferentia (and different from GPU), and the order of summation can lead to slightly different results. Will you be able to measure the accuracy over the evaluation data set for both CPU and Inferentia (and GPU also if it is available)?
Yes I found some fluctuation in the CPU vs GPU numbers as well, but the fluctuation is smaller by 100 or 1000 times as compared to neuron inferentia. Is that expected? These are the first few numbers from GPU [-1.0692453384399414, -1.4999507665634155, 1.6326944828033447, -0.13731196522712708, -0.8026626110076904, -0.48562130331993103, -0.21466472744941711, 0.44606760144233704,.... These are the first few numbers from CPU [-1.0692460536956787, -1.4999487400054932, 1.6326937675476074, -0.13731253147125244, -0.8026641607284546, -0.48562222719192505, -0.21466375887393951, 0.4460683763027191,.... These are the numbers from inferentia [-1.0766984224319458, -1.4989659786224365, 1.6356642246246338, -0.13928218185901642, -0.8090097904205322, -0.4883664846420288, -0.2172311544418335, 0.4422350823879242,...... The CPU vs GPU numbers differ from about 6 digits after the decimal point but inferentia starts differing from 2 digits after decimal.
Yes this is approximately in the range that we expect.
The fundamental difference in the inf1 Neuron hardware is that all of the matrix multiplication-like operations will be performed in BF16 by default. See the mixed precision guide for more information: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/appnotes/neuron-cc/mixed-precision.html
It is sometime possible to achieve better precision with FP16 depending on the model weights and operations. The highest precision FP16-tuned configuration can be achieved using the following flags:
--fast-math fp32-cast-matmult-fp16 no-fast-relayout
Relevant content
- asked a year ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 3 months ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated a year ago
Im getting reasonable outputs for "--fast-math=none" but without this or using the default, all values in my embedding tensor is NaN. Im doing this so I can get higher throughput, at the cost of some precision but with all NaNs the output is hardly usable. How do I solve this??
You should see a noticeable speedup when using either --fast-math=none or the default flags. The --fast-math=none flag disables some optimizations that can impact floating point precision, but the model will still run on the accelerator. If this doesn’t meet your performance requirements let us know the metrics you are observing and we can see if there is more we can do.
Secondly, there was a known issue in transformers that could cause NaN values to occur on some models in transformers>=4.20 (See: https://github.com/aws-neuron/aws-neuron-sdk/issues/474) This should be resolved as of the 2.5.0 Neuron release: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/rn.html#neuron-2-5-0-11-23-2022
It is potentially possible that you would be able to get the full performance benefits of using the default flags in addition to getting accurate results by using the latest release of Neuron. This will be model/weight dependent.
Hi Jonathan, --fast-math=none does work but would appreciate a greater speedup that the default would provide. In order to use the default, which is currently giving all NaN's in my embedding, do I have to compile with neuron 2.5 or is it just the runtime? I did install neuronx-dkms v 2.6 and also neuronx-tools v 2.6 on my inf1 instance but there is no change. Still getting all NaNs. I do my inferentia neuron model compilation on my laptop mostly, or sometimes a c5.12xlarge if needed. Sorry, Im still new to all of this so my queries may seem silly at times. :). Wishing you a Happy New Year