Elasticache vertical scale up strange behaviour


Hi community!

In my application, ElastiCache (Cluster mode disabled) is used in two scenarios, daily:

  1. Intense usage, for about 3 hours, in which we need an improved network performance, and is run with cache.m6g.2xlarge
  2. Light usage, for the rest of the day, in which a cache.m6g.large would be more than enough.

We now use the 2xlarge 24/7, but would be nice to be able to vertically scale up and down during the intensive hours. However, when we do a scale up (large ⇾ 2xlarge) right before the heavy process, the behavior of the instance is not the same if we don't scale (keep 2xlarge for the whole day). Just for comparison, the first graph shows the Network Bytes In when there is a scale up right before the process, and the same metric when there isn't:

With scale up, from cache.m6g.large to cache.m6g.2xlarge, reaching a max of 13Gb per minute With scale up, from cache.m6g.large to cache.m6g.2xlarge, reaching a max of 13Gb per minute

No scale up, instance cache.m6g.2xlarge reaches a max of 24Gb per minute No scale up, reaches a max of 24Gb per minute

Note that the cache process only starts after the cluster Status is set to available. This drop in the Network Bytes In rate shouldn't be happening, and it is making the scale option to be impracticable to us. What is the point in giving an online scaling feature that does not work as it should after the scaling?

Has anyone experienced something similar, and do you know of any alternatives to accomplish our goal, to provide performance only during the cache hours, keeping our costs reasonable?


1 Answer

m6g instances up to m6g.4xlarge have a variable network bandwidth capacity. They support up to 10 Gbps. See https://aws.amazon.com/ec2/instance-types/m6g/. These instances can support temporary bursts in traffic rate.

When you scale up from large to 2xl, the data transfer from the old node to the new node consumes some of the available burst capacity on the new node. So after the scale up, your application is not able to use the burst.

When you don't scale up throughout the day, your application has access to the temporary burst capacity.

Do you observe the same behavior when you scale up to 8xlarge?

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions