Elastic cache redis network allowance exceeded

Question

We are seeing lots of network allowance exceeded on reading from 1 or 2 shards alone. Running in clustered mode with 18 shards. We think there might be some objects which are large which might create these exceptions. IS there a metric or a way to find the values size in redis elastic node or if there is metric which shows the size of values in redis ?![Enter image description here](/media/postImages/original/IMMSms-wUhQsGMvHz7hJszLw)

Answer

Hello,

Thank you for your query!

As per the official AWS document, we can see that the metric 'NetworkBandwidthOutAllowanceExceeded' indicates the number of packets queued or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance.

[+] https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/CacheMetrics.HostLevel.html

A spike in this metric is usually seen when there is heavy network traffic on the cluster/node, which causes the cluster to constantly function above the network baseline limit, eventually leading to network throttling. When the network throttles, these spikes can be seen. If there is no visible spike in the network bytes metrics, there is also a chance that microbursting took place.

Further, regarding your query, please note that unfortunately there is no metrics available that lists the size of Keys directly in Redis. However, you can monitor bigkeys in the clusters as any operation like read/write/evict/sync on those keys would use more system resources. redis-cli has --bigkeys option that sample Redis keys looking for keys with many elements (complexity).

$ redis-cli --bigkeys
[+]https://redis.io/docs/ui/cli/

In cluster mode enabled clusters, I would suggest you to provide individual endpoints of the master nodes instead of the cluster configuration endpoint as shown below:

src/redis-cli -c -h  -p 6379 --bigkeys

Thank you for your interest in re:Post community.

Have a great day!

Answer

The best way to troubleshoot/understand these Network*AllowanceExceeded metrics is to determine what specific impact this has had on the application. Are you seeing timeout errors or visible slowness on your application matching the timestamp of these spikes?

Since TCP is a reliable transport protocol, dropped packets are retransmitted. This is intended functionality and happens independently on inbound and outbound traffic. It is common to observe occasional spikes in these metrics. If the metric shows sustained high values (10k/min or more), it's only meaningful when NetworkBytesIn/NetworkBytesOut is approaching the host's baseline network bandwidth.

If no latency issues are observed or if the numbers are fairly low, then no further action is required.

Note: Please note that `NetworkBytesIn` and `NetworkBytesOut` metrics are measured at a per-minute granularity. Network traffic shaping, which generates non-zero `BandwidthInAllowanceExceeded` & `BandwidthOutAllowanceExceeded`, happens at a much smaller granularity (milliseconds).  Small bursts of traffic will cause some traffic shaping, even if average bandwidth is well within limits.  This can happen even during a single SET or GET operation for a larger item.

Elastic cache redis network allowance exceeded

相關內容