Increased wait times for MariaDB RDS

0

I've been given maintenance responsibilities over an API that uses a MariaDB (10.0.21) db.t3.2xlarge. Today, for no apparent reason (checked the number of requests coming to the api and no anomaly, no code changes, no database changes) the database started becoming slow to the point where even getting an access token to the api took between 4-6 seconds (normally it would be under 500ms).
From performance insights (for the last 5h) I can see that for the database load by waits (AAS) a big majority of the wait is for "synch/mutex/aria/PAGECACHE::cache_lock"(29.92) followed by CPU (13.87) and the SQL that creates all of this is a select from a view that worked just fine up to this point.
The graphs:
https://imgur.com/B5baGzx
https://imgur.com/BTU6MAf

Edited by: NicolaeGP on Oct 23, 2019 3:24 AM
Added the links to the graph images

asked 4 years ago390 views
4 Answers
0

Couldn't find any information on google searches about this, but I was wondering if it was some time of memory pressure issue.
I did a made up sort

select * from persons a, persons b , persons c , persons d , persons e order by a.firstname;

using data from

http://www.moinne.com/blog/downloads/countries_persons_tables_2012-11-28.zip

and ran it by a few sessions and got the same kind of load

CPU
synch/mutex/aria/PAGECACHE::cache_lock

What is the explain plan of the query like? Are the indexes it needs in place ? Is it doing sorting?

Kyle

answered 4 years ago
0

All the graphs I could get on the RDS:
https://imgur.com/a/8XO04gZ

The analyze of the select:
https://imgur.com/S9uQJpW

I'm not a database expert but I guess some of this is not done properly. The question though still remains.. why does this happen all of a sudden? We do have automatic minor update for the engine but I checked and there were not updates in the past week.
I've also checked the process list for the db and could see the select from v_links query stuck in "Sending data"

I've also run the query that was stuck in "sending data" and that finished in 15ms.

After doing some research I've found these 2 database statuses:
Aria_pagecache_read_requests | 11323551304
Aria_pagecache_reads | 834127724
The ratio being somewhere ~1:13 which might be too low if aria is the same as myisam in this regard: https://mariadb.com/kb/en/library/optimizing-key_buffer_size/
However when trying to test this and increase the aria_pagecache_buffer_size to something more than the standard 128MB I realized that the parameter is not present in the parameter group and it's read only when trying to change it directly on the db. Is the small size of the aria pagecache buffer what impacted my performance in such a way?

answered 4 years ago
0

After getting in touch with AWS support and long investigations we discovered that the max_tmp_table was set to 32 and since we had a large number of concurrent requests on a view that created multiple temporary tables, that 32 was too small and tmp tables started to spill to disk hence the aria cache wait time increased until the point of hanging.

answered 4 years ago
0

Thanks for the followup.
Kyle

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions