Question on Buffer Cache Hit Ratio

0

I'm investigating an issue in a DocumentDB instance that is leading to intermittent high CPU use spikes. My hunch is I'm dealing with inefficient queries. I'm trying to understand the role of the BufferCacheHitRatio metric as I never go below 99.9% for that value. Does the database still have to process a query before it looks into its buffer cache for the data?

Other information that might be important: I'm seeing DatabaseCursorsTimedOut going above 0 at times as well as occasional hitting of max cursors allowed for the DB instance tier.

hcastro
asked 2 years ago733 views
1 Answer
3
Accepted Answer

Ideally, reads from disk are slower than that of memory. Index pages are paged into memory based on the queries that are run against the database. When you first start up the cluster, its buffer cache is empty and will fill up with pages as your workload is queried against the db. If an index size is larger than the amount of memory (index is unable to fit into memory), the database has to purge and read from disk to iterate through the index (this is the same for data pages). Ideally, you want to have enough RAM on your instance so that both your data pages and index pages fit in memory.

"Does the database still have to process a query before it looks into its buffer cache for the data?":

The query still has to process (query plan), but the fetching of the results is determined on the location of the pages (Mem vs storage), the way the query processes is determined by the structure and type of query.

Generally speaking, the value of BufferCacheHitRatio should be as high as possible (as close to 100% as possible), as reading data from working set memory is faster and more cost-effective than reading from the storage volume. If your indexes do not fit into memory, you will see a lower BufferCacheHitRatio. Continually reading from disk incurs additional I/O costs and is not as performant as reading from memory. You can monitor VolumeReadIOPS to determine if disk is being hit more than expected.

A key rule to remember is that, if you notice an increase in diskIOPS + increase in cost + decrease in the BufferCacheHitRatio and IndexBufferCacheHitRatio, then your best bet is buffer is not being utilized.

As CPU is high and BufferCacheHitRatio doesn't go <99.99%, in order to determine if it is the query's causing the increase in CPU, I recommend enabling performance insights [1] on the cluster, this will allow you to monitor the Vcpu's being utilized by the queries. The profiler log will also assist with determining the execution time for queries [2]. I also added [3] that will assist you in determining the root cause for the increased CPU.

References: [1] Performance insights Documentdb: https://docs.aws.amazon.com/documentdb/latest/developerguide/performance-insights.html

[2] Profiling Documentdb: https://docs.aws.amazon.com/documentdb/latest/developerguide/profiling.html

[3] Determining high CPU documentdb: https://docs.amazonaws.cn/en_us/documentdb/latest/developerguide/user_diagnostics.html#user_diagnostics-cpu_utilization

AWS
SUPPORT ENGINEER
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions