Trying to diagnose why 1 second query takes longer under server load

0

We are running a RDS MySQL database on a db.t2.medium instance in eu-west-1b. We have a EC2 t2.medium running in eu-west-1a (might or might not be relevant).

We have a stored procedure, that has a fairly complex query and under minimal server load takes 1 second to execute (2 seconds to return dataset to the calling app/service). This is fine for our purposes in terms of the query taking 1 second.

'Calling SP Single App.png' shows my test console app calling the stored procedure repeatedly.

However, we've noticed that when the database server is under load in terms of number of requests, this same query can take anywhere between 3 and 5 seconds plus to execute.

'Calling SP Multi Apps.png' shows 4 copies of my test console app running and calling the stored procedure repeatedly. You can see the different in the query times.

Not only are the query times slower, but the RDS CPU is spiking 60/70%+. 'CloudWatch RDS Metrics.png' shows the CloudWatch metrics, and recently caused our RDS instance to become unstable with 100s of connections.

We are trying to figure out why this query when called multiple times under high load causes the CPU to spike and the query time to increase.

  • Do we need to consider upgrading our instance types
  • Are we being restricted by network bandwidth

Any help greatly received.

neildt
asked 5 years ago887 views
5 Answers
0

Hi,
Your CloudWatch metrics show that your CPU bottlenecking at 100%. dbt2.medium is a very low-end Burstable 2vCPU EC2 instance with only 4 GB RAM. Please upgrade your instance and re-benchmark. If for some reason you are still having issues, I would recommend enabling RDS "Performance Insights" to help debug/tune.
-randy

answered 5 years ago
0

Could this issue be caused by lack of columns having the relevant INDEX?

neildt
answered 5 years ago
0

There are a few things at play here, and some suggestions which may help resolve the issue.

Given the graphs shared, I am sure that this is an issue, but performance graphs showing the effects might not be the most useful in this type of issue to resolve the underlying problem. Also keep in mind that stored procedures can be notorious for behaving in a single-threaded manner, which is not to say that is the case but rather that it is a consideration. I would strongly urge some research into the performance effects of stored procedures as a starting point to evaluate how effective the solution is for the specific architecture.

Taking the stored procedure out of the equation, I would recommend the following:

  • Examine all of the queries of the stored proc, and run them through an explain plan to see if they are using appropriate indexes.
  • Examine the architecture of each of the tables from specific queries involved, and are there joins that are not indexed, columns in the where clause that do not have proper indexes, what are the sizes of the tables involved, and can archiving be utilized? In some cases, it is not the index that is the issue, rather the sheer volume of rows that need to be searched to retrieve those that fit the search criteria.
  • Do any of the tables contain large blobs or text fields? Those types of columns should be split out of possible allowing for a faster retrieve of the relevant rows, leaving the actual raw data for a more specific pull.

Granted much of this is generic and best-practice advice, but might serve as a good starting point to determine where the general bottlenecks are located.

answered 5 years ago
0

The Cloudwatch graphs already show CPU hitting a 100% CPU which could easily be the reason for the slowdown.

Performance Insights would give more details on CPU demand and how that affects the queries.

Both Performance Insights and Enhanced Monitoring (EM) give Swap In & Out metrics which will further indicate if there are memory pressure issues. Performance Insights uses MySQL's Performance Schema (PFS) which uses more memory so if the system already has memory issue PFS could increase that pressure. Enabling EM doesn't uses PFS so would be a good way to first check memory before enabling PI. There should be little to no Swap IN if the systems has sufficient memory.

Kyle

answered 5 years ago
0

This issue has now been resolved. It turned out that we needed to add additional indexes on two of our tables. Thank you.

neildt
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions