When I perform a point-in-time restore operation of my Amazon Relational Database Service (Amazon RDS) DB instance, the operation takes a long time to complete.
Resolution
After you restore the RDS DB instance from backup, view the Amazon RDS log files to track the progress of your point-in-time restore.
Amazon RDS uploads transaction logs for DB instances to Amazon Simple Storage Service (Amazon S3) every 5 minutes. During a point-in-time restore, the snapshot that's closest to the time you set in the point-in-time restores first. Then, Amazon RDS applies the transaction logs until the point-in-time that you set. The length of the operation is based on the number of transaction logs.
For example, you must perform a point-in-time restore at 06:15 UTC for an automated backup of a DB instance taken at 04:00 UTC of the same day. In this example, the instance restores from the backup that's created at 04:00 UTC. Then, Amazon RDS applies the transaction logs until 06:15 UTC to the restored instance to complete the point-in-time restore process.
To reduce the time that it takes to restore your DB instance to a specific time, use the following best practices:
- Take manual snapshots regularly to decrease the recovery point objective for source instances that have automated backups turned on.
- Make sure that the source database doesn't have long-running queries in the process at the time of the point-in-time restore. Long-running transactions can increase the recovery time and cause the database to be unavailable for longer.
- Check your transaction log size. Large transactions write to the transaction file at one time, and Amazon RDS doesn't split the transactions across different files.
Note: Large transaction log files increase the crash recovery time.
- Run a full-table scan, such as SELECT * on each table to speed up access to important tables after a restore. This prompts Amazon RDS to download all the table data from Amazon S3 to the Amazon Elastic Block Store (Amazon EBS) volume.
Note: If you don't download all the table data but try to access an EBS block, then lazy loading can occur. After the snapshot of your instance restores, data from the snapshot that's in Amazon S3 copies into the EBS volume. When you try to access a block that's not already copied, Amazon RDS pulls that block from Amazon S3, and then I/O latency results.
When you restore an Amazon RDS for PostgreSQL DB instance to a specific point in time, you receive information in your log file.
Example output:
2022-06-01 13:16:19 UTC::@:[8613]:LOG: starting point-in-time recovery to 2022-06-01 12:54:30+002022-06-01 13:16:19 UTC::@:[8613]:LOG: redo starts at 0/48A3220
waiting for 000000010000000000000001 archive /rdsdbdata/log/restore/pg-wal-archive.1.* to be downloaded
2022-06-01 13:17:22 UTC:127.0.0.1(46110):rdsadmin@rdsadmin:[10322]:FATAL: the database system is starting up
2022-06-01 13:17:25 UTC::@:[8613]:LOG: restored log file "000000010000000000000001" from archive
recovering 000000010000000000000002
2022-06-01 13:17:26 UTC::@:[8613]:LOG: restored log file "000000010000000000000002" from archive
recovering 000000010000000000000003
2022-06-01 13:17:28 UTC::@:[8613]:LOG: restored log file "000000010000000000000003" from archive
recovering 000000010000000000000004
2022-06-01 13:18:54 UTC::@:[8613]:LOG: restored log file "000000010000000000000022" from archive
recovering 000000010000000000000023
.
.
2022-06-01 13:33:16 UTC::@:[8613]:LOG: restored log file "00000001000000060000000B" from archive
2022-06-01 13:33:16 UTC::@:[8613]:LOG: recovery stopping before commit of transaction 9266438, time 2022-06-01 12:56:14.648042+00
2022-06-01 13:33:16 UTC::@:[8613]:LOG: redo done at 6/2C0003C0
2022-06-01 13:33:16 UTC::@:[8613]:LOG: last completed transaction was at log time 2022-06-01 12:51:14.646151+00
recovering 00000002.history
2022-06-01 13:33:16 UTC::@:[8613]:LOG: selected new timeline ID: 2
2022-06-01 13:33:16 UTC::@:[8613]:LOG: archive recovery complete
recovering 00000001.history
2022-06-01 13:33:16 UTC::@:[8620]:LOG: checkpoint starting: end-of-recovery immediate wait
2022-06-01 13:33:16 UTC::@:[8620]:LOG: checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 8 recycled; write=0.002 s, sync=0.003 s, total=0.031 s; sync files=2, longest=0.003 s, average=0.002 s; distance=655360 kB, estimate=1611806
kB
2022-06-01 13:33:16 UTC::@:[8607]:LOG: database system is ready to accept connections
2022-06-01 13:37:18 UTC::@:[8607]:LOG: received fast shutdown request
2022-06-01 13:37:18 UTC::@:[8607]:LOG: aborting any active transactions
2022-06-01 13:37:18 UTC::@:[8607]:LOG: background worker "logical replication launcher" (PID 7394) exited with exit code 1
2022-06-01 13:37:18 UTC::@:[8620]:LOG: shutting down
2022-06-01 13:37:18 UTC::@:[8620]:LOG: checkpoint starting: shutdown immediate
2022-06-01 13:37:18 UTC::@:[8620]:LOG: checkpoint complete: wrote 9 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.003 s, sync=0.003 s, total=0.024 s; sync files=7, longest=0.003 s, average=0.001 s; distance=65535 kB, estimate=1457179
kB
2022-06-01 13:37:20 UTC::@:[8607]:LOG: database system is shut down
2022-06-01 13:37:24 UTC::@:[10870]:LOG: starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
2022-06-01 13:37:24 UTC::@:[10870]:LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-06-01 13:37:24 UTC::@:[10870]:LOG: listening on IPv6 address "::", port 5432
2022-06-01 13:37:24 UTC::@:[10870]:LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-06-01 13:37:24 UTC::@:[10875]:LOG: database system was shut down at 2022-06-01 13:37:18 UTC
2022-06-01 13:37:24 UTC::@:[10870]:LOG: database system is ready to accept connections
Related information
Amazon EBS snapshots
Why is my Amazon Aurora DB cluster clone, snapshot restore, or point in time restore taking so long?