Optimizing Reserved Instance Purchases for RDS when Instance Scheduling is in use

10 minute read
Content level: Intermediate
3

Learn in-depth how Reserved Instances work, and how to best optimize your purchases with RDS especially when Instances are scheduled to run only at certain time.

Amongst the various commitment based discount schemes offered by AWS, two of the most commonly used are Savings Plans (SP’s) and Reserved Instances (RI’s). Savings Plans offer a great deal of flexibility - however they can only be purchased for certain resource types. One of the resources that Savings Plans cannot currently be used against are RDS instances. In times past this wouldn’t have mattered as much, as the database tier of an application was “always on” regardless of application workload, and only the application or front-end tiers would have been scaled. As time has passed, cost optimizing workloads has become ever more important, to the point where now it is no longer a given that the database tier runs on a 24x7 basis. This presents a new challenge for customers - when their workload consists of a number of RDS instances, and these are run on a part-time (scheduled) basis, how do you optimize your RI purchase to maximize your savings?

To solve this problem, first of all we need to understand how RI's work:

  1. For this example, we have one instance of db.t3.large running MariaDB, that (hypothetically) costs $0.90 per hour when running (all costs in this article are fictional and have been chosen to make the maths easier - please always refer to AWS pricing pages here - https://aws.amazon.com/rds/pricing/ - before making your own calculations).
  2. All RDS instances (and AWS compute-based resources, in general) feature either per-hour, or per second billing - though again always check for your instance type before making your own calculations.
  3. If you want to understand whether you want to purchase a RI for this, you would need to look at your usage profile. Let’s assume for now that every day looks the same, and looks like this. Also let’s assume that the RI purchase gives you a 30% discount on the hourly instance cost - the charges for a given day would look like this (we’ll also assume no upfront cost, as again this makes the maths easier!):

Daily cumulative charges for a single RDS instance

In the example above, there is clearly a strong argument to purchase a RI - at the end of a single day’s run, there is a 30% saving to be realised, so assuming that this pattern of running will be kept up for a least a year (the minimum term length for a RI), then it is a worthwhile purchase.

Now let’s say that we have another MariaDB database - identical to the one above, except this one is used during office hours only, for running reports. If we now plot out the costs as above, the day’s charges look like this:

Daily charges for a single RDS Instance versus an RI

Running the database between the hours of 9 and 5 only incurs charges of $8.10 in our example, yet a RI is billed for the whole day regardless - hence in this case it is clear that an RI purchase would actually cost us more money. The situation gets worse if we look over the course of a week - assuming the database isn’t needed at weekends, then zooming out to a daily view, a week’s charges would look like this:

Weekly charges for a single RDS Instance versus an RI

Of course, with just 1 database, it’s really easy to work out whether the RI should be purchased or not. You simply need to look at the number of hours your database runs in a given month, and work out whether it runs for more than:

  • (100% - [RI Discount]) x [hours in a month]

To complete this example, let’s assume we’re running the above database in June, which has 30 days (720 hours). We know that for an RI purchase to be worthwhile, we must run it for more than:

  • (100% - 30%) x 720 = 504 hours

If the database runs 24x7 then obviously a RI purchase is worthwhile. By contrast, in our office hours example, the database runs for 9 hours a day, 5 days a week. In June of 2023, this would total 198 hours for the entire month, so we can easily see without having to map out the hours, or even days in the month that a RI purchase is not worthwhile.

This super simple example shows the importance of understanding your workload and RI pricing before making any purchasing decisions, as the wrong decision can quite literally cost you money. Few cloud infrastructures though have just 1 database - it’s common for them to have many. Equally, RI's can be applied to any running instance within an AWS Organization (they are not even limited to a single account in a multi-account configuration) provided they are of the same instance class type, and the same database engine (more details on this may be found here: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithReservedDBInstances.html)

Let’s develop our example now so we can understand this. Say our organization now has 3 databases. They feature the following characteristics:

  • MariaDB is the database engine for all of them
  • db.t3 is the instance class type for all of them
  • The main production database is Multi-AZ. The reporting databases are Single-AZ.
  • All of them run on difference schedules
  • For simplicity, all of them run on the same schedule every day of the week - each day looks like this:

Cost profile of 3 RDS instances where scheduling is in use

In the above example, we can see that the production database runs 24 hours a day, and the other two run part time (perhaps for reporting purposes). Given this more complex example, how should we choose our optimal RI purchase? Well we could look at each database individually, and make a decision based on that - we could even purchase the the RI's individually. Whilst entirely possible, this is time-consuming and cumbersome, especially when most enterprises will have many more than 3 databases. It also misses an important point - RI's are not locked to a specific instance, but will be allocated to any that are running. Thus in the graph above, a RI of db.t3.medium Single-AZ would be automatically allocated to the gray line at between 5am and 9am, and then to the blue line between 7pm and 9pm - you don’t need to configure this behavior - RI capacity will always be applied until it runs out, at which point on-demand pricing is used for any instance hours not covered by reservations. If you considered all databases individually you would miss this potential optimization, and so it is important to consider all the databases together.

How then do we work this out? To do this, we must look at normalized units. A normalized unit is equal to an RI on the smallest instance type available in that family, these can be built up to form the larger instance types. This means that in our example above, we could purchase our RI’s by specifying db.t3.medium Single-AZ. If we purchase 4, we will have purchased enough to cover the production database. This is described in much greater depth on this page: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithReservedDBInstances.html

Number of normalized units used by each database during any given 1-hour window

The graph shows the number of normalized units used by each database during any given 1-hour window - the gray line assumes that this database runs for 30 minutes in the 5am to 6am window - otherwise whole hours are assumed for ease of calculation.

The way to work out our optimal purchase of RI's is quite easy - we need to work out the usage from the above graph as a percentage of the maximum number of units that could be consumed. Let’s work this out first:

  • db.t3.large Multi-AZ = 8 normalized units x 24 hours = 192 units in a day
  • db.t3.medium Single-AZ = 2 normalized units x 24 hours. = 48 units in a day

Thus the maximum unit consumption possible in our infrastructure, if all databases ran for 24 hours a day, is:

  • 192 + 48 + 48 = 288 units

However in the above graph, our actual consumption is:

  • 192 + 27 + 24 = 243 units

As a percentage:

  • (243/288) x 100% = 84.375%

We know that logically (unless expansion is planned and guaranteed to happen) that our maximum purchase should be:

  • 8 + 2 + 2 = 12 normalized units of db.m3.small Single-AZ

However as two of our databases are running part time, it’s almost certain that purchasing 12 RI's won’t be optimal. Using the percentage above, our predicted purchase should be:

  • 84.375% x 12 = 10.125 units of db.t3.small Single-AZ

Naturally, you can’t buy 0.125 of a database, so we’ll round the answer (down in this case) to 10 units.

How can we prove that this is the right decision though? Indeed, when our infrastructure is operational, how can we know that our usage patterns are still optimal, especially if the schedule is changed over time? Fortunately there are two metrics (both available via your Cost Explorer console) to demonstrate whether your RI purchase is being optimally use - these are Coverage and Utilization.

Coverage is defined as the proportion of your running workload that is covered by a reservation. Utilization is the inverse of this - it is the proportion of your reservation purchases being used by active workloads. They can be calculated as:

  • Coverage = 100% or ( (RI time purchased) x 100% ) / (RI time used) - whichever is the lower
  • Utilization = 100% or ( (RI time used) x 100% ) / (RI time purchased) - whichever is the lower

So, let’s say that we’ve decided to purchase 10 units of db.t3.small Single-AZ as calculated above - we’ll adapt the previous graph to include the hourly coverage and utilization figures:

A good example of a well optimized purchase

The graph above is a good example of a well optimized purchase - both Coverage and Utilization figures are consistently high, though it is not realistic to expect 100% for either value all of the time in a scheduled environment. Note that the average Coverage and Utilization figures for the whole 24 hour period are the same - this should be the case if your purchase was optimal, though again you may find it’s not possible to meet this exactly - in short though, the closer these two values are over a given time period, the more optimized your purchase. We can also validate our RI purchase decision by adding its cost over the course of a day to one of our previous graphs:

Adding RI purchase cost to our previous 24 hour example

One final word of warning on the average Coverage figure - to match the values generated in Cost Explorer, this is calculated based on the number of time periods where the Coverage is greater than 0%. This is the case in every hour considered above, and will commonly be the case for most environments - however in the event that you have time periods where Coverage is zero, you must divide the total of all coverages by the number of non-zero time periods.

You can expand the examples and calculations provided here for your own purposes, and in that way you will be able to ensure you are making the best purchasing decisions for your environment.