Why did aurora spike and lock up?

0

This morning our aurora mysql serverless 2 spiked and basically locked up. We have a writer/reader and a reader associated with the cluster.

On the reader we had an intense select query. This is the query RDS shows (field names changed), the point is it has multiple FIND_IN_SET functions and insights show it examined 109826.50 rows.

SELECT thisstuff ( thistime ) AS thestuff , COUNT ( * ) AS count FROM mytable WHERE todo = ? AND someID = ? AND ( ( FIND_IN_SET ( otherID , ? ) ) || ( FIND_IN_SET ( otherID , ? ) ) || ( FIND_IN_SET ( otherID , ? ) ) ) AND ( thistime BETWEEN ? AND ? ) AND some_status = ? GROUP BY months_todo ( thistime )

At the same time our writer had the following waits going on. wait/io/redo_log_flush synch/sxlock/innodb/hash_table_locks wait/io/table/sql/handler

Can somebody please help me understand why this happened, as in did the writer lock the tables or did the FIND_IN_SET lock the writer? The writer was making updates to the same table at the time the query was going on. Most importantly there a way to prevent this?

Thank You Mike

mmoore
已提問 3 個月前檢視次數 114 次
1 個回答
1
已接受的答案

It seems there were a few factors contributing to the slow SELECT query performance on your Aurora MySQL serverless cluster:

The query was doing multiple FIND_IN_SET operations which can be expensive, especially if the tables are large and it has to examine a lot of rows.

At the same time, the writer was experiencing waits on redo log flushes, table locks and innodb hash table locks. This indicates the server was under heavy load and experiencing locking contention from concurrent DML operations.

You mentioned the cluster spiked and locked up, so the resources were likely overutilized as well. Aurora Serverless clusters are meant to auto scale but may not have been able to keep up with sudden spikes in load.

Some things you could try:

  • Check CPU, memory and I/O metrics to identify resource bottlenecks
  • Add indexes on columns used in WHERE, JOIN and ORDER BY clauses to avoid full table scans
  • Rewrite the query using temporary tables or subqueries to avoid multiple FIND_IN_SET calls
  • Increase the capacity settings if spikes are frequent to allow more headroom
profile picture
專家
已回答 2 個月前
profile picture
專家
已審閱 2 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南