How do I troubleshoot a decrease in the CacheHitRate metric in my ElastiCache Redis cluster?

3 minute read
0

I want to troubleshoot the decrease that I'm experiencing in the CacheHitRate metric for my Amazon ElastiCache Redis cluster.

Resolution

When the CacheHitRate decreases, usually the number of cache misses increases. For more information, see Monitoring Cache efficiency. To troubleshoot a decrease in CacheHitRate, take the following actions.

Check whether the Redis engine evicted keys

A decrease in CacheHitRate can occur during an eviction when the Redis engine evicts keys to manage memory.

To check whether the Redis engine evicted keys, review the following Amazon CloudWatch metrics:

  • Evictions
  • BytesUsedForCache
  • DatabaseMemoryUsagePercentage

To resolve this issue, scale your cluster.

Review your key expiration configuration

If your keys expire too quickly, then you might see a spike in the Reclaimed metric. To view the Reclaimed metric, run the INFO command, and then review the output for the number of expirations events. This metric lists the total number of keys that Redis removed because their time-to-live (TTL) expired. For more information, see INFO on the Redis website.

To resolve this issue, update your TTL settings for your keys. For more information, see TTL on Redis website.

Review client updates for removed keys

If the application tries to retrieve keys that a client update removed, then you might see the following changes:

  • An increase in CacheMisses
  • A decrease in the CacheHitRate

To determine whether the client update removed keys, review the application for updates from commands such as FLUSHALL, DEL, or UNLINK. For more information, see FLUSHALL, DEL, and UNLINK on the Redis website.

To reduce the removal of keys, it’s a best practice to use Role-Based Access Control (RBAC). Or, use the rename-commands parameter to rename and note commands that can cause significant issues, such as the removal of keys. For more information on the rename-commands parameter, see ElastiCache version 5.0.6 for Redis OSS (enhanced).

Check whether ElastiCache recovered your cluster

When a cluster experiences hardware issues, ElasticCache recovers the cluster and removes all data from the caches. Because the cache is empty, requests for data from the cache result in cache misses.

To check whether ElastiCache recovered a cluster, check the Events section in ElastiCache for the CacheHitRate metric. 

To improve data availability and enhance fault tolerance, add a read replica to the cluster and turn on Multi-AZ with the Auto failover option.

Follow caching best practices

If the CacheHits and CacheMisses metrics are both at 0, then there aren't any requests for the cache. In this case, CloudWatch doesn't display any data for the CacheHitRate metric.

It's a best practice to have a cache hit ratio of 0.8 or greater. If Redis evicts too many keys in your cache, or the keys expired or don't exist, then your cache hit ratio is lower. To improve cache performance, follow caching best practices.

Related information

Metrics for Valkey and Redis OSS

AWS OFFICIAL
AWS OFFICIALUpdated a month ago