RDS Read IOPS, Read Throughput, Queue Depth sudden increase

3

I got sudden spike in AWS RDS postgresql db.t3.small instances, it goes through 3000 Read IOPS, chugging all my IOPS Burst Balance until it empty and still stay on 300 IOPS, i've tried rebooting, increasing the gp2 size but no luck, upgrading the instance however partially mitigated the issue, as it doesn't chug all the way through burst balance.

Symptoms:

  • Spike in read IOPS, read throughput, queue depth every 15 minute
  • Started only 3 days ago
  • When the spike occur, it also spike the cpu

I've tried:

  • disabling autovacuum, but it only reduce write iops
  • upgrading, to no avail, I am still on postgresql 10.17

System:

  • AWS RDS with PostgreSQL 10.17
  • some 2000-3000 databases ( we have database for each user )
  • 170 GB gp2 usage

Monitoring: https://imgur.com/a/tPCDBUC

  • fikrimi, we seem to be seeing relief on our end. Wondered if you were seeing the same?

  • looks good as of 31 Jan

fikrimi
preguntada hace 2 años4320 visualizaciones
5 Respuestas
1

We're absolutely having the same problem over here: https://repost.aws/questions/QUIK9smsSVTriFJ3ynQ8w7iw#AN9SjEmXfuSmeQpjJhnhPF_w

We're reluctant to pay for AWS Support for something that seems like it's on AWS's end.

ssmith
respondido hace 2 años
1

Additionally, we're now seeing the same thing on another database in the same AZ which is disconnected from the application. Started at just before midnight US/Eastern. Spiked to 3000 IOPS and almost drained all of our burst credits.

ssmith
respondido hace 2 años
0

From what you described above, it may be an issue with underlying EBS volumes. Please have AWS support check in for you and raise support case if you have not already done so and we should be able to help by looking at your instance.

Bakul_R
respondido hace 2 años
  • Should I get the developer support plan ?

0
respondido hace 2 años
-5

GP2 with 170GB has a burst performance upto 3000 IOPS, and base performance of 170x3 = 510 IOPS. You can find detail here:

AWS
respondido hace 2 años
  • yes I know that, the problem isn't the base performance/burst capacity, its the spike that can go up to 3000 IOPS for a minute, every 15 minute, that start suddenly just 3 days ago

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas