How do I avoid the long point-in-time recovery period of my Amazon RDS for MySQL instance?

4 minute read
0

I want to avoid a long point-in-time recovery (PITR) time period of my Amazon Relational Database Service (Amazon RDS) for MySQL instance.

Short description

Point-in-time recovery (PITR) allows you to restore a DB instance to a specified point in time. When you initiate a PITR, the most recent backup (either automated or manual) is restored. The factors that affect PITR time are:

  • The required time to perform the latest snapshot restore.
  • The quantity and size of the transaction logs that exist between the time of the last automated snapshot and the PITR API call.
  • The applied contents of the transaction logs that are part of the instance recovery.
  • The types of DB instance classes. Different instance classes might have different I/O throughput that might affect the log applying performance.
  • Long running transactions on the source database at the specified PITR time.

The following are common scenarios that cause PITR to fail:

  • DB instances that are in a storage full state.
  • You use MyISAM in MySQL. MyISAM storage engines don't support crash recovery processes and MyISAM tables don't allow PITR to work properly.
  • You use temporary tables, in-memory tables, and distributed transactions. For more information, see Diagnosing and resolving point-in-time restore failures.
  • You set sync_binlog to 0. This setting might reduce the performance overhead that's required to sync the binary logs. Also, when there's a power failure or operating system (OS) crash, some commits might not sync to the binary logs. This asynchronization might affect PITR capabilities. It's a best practice to set sync_binlog to 1 so that you don't lose events from binary logs. For more information, see sync_binlog on the MySQL website.

Resolution

To avoid a long PITR period, use the following best practices:

  • Implement a disaster recovery strategy.
  • Run transactions in smaller batches, and run the COMMIT command frequently.
  • Take a snapshot before and after large transactions that run. Transactions that are larger than the max_allowed_packet parameter cause PITR to fail. For more information, see max_allowed_packet on the MySQL website.
  • Minimize snapshot restore times. Snapshot restores are initiated as part of the PITR process. A longer snapshot restore might contribute to a longer PITR session. For more information, see Why does it take so long to restore a snapshot of my Amazon RDS for MySQL DB instance?
  • To reduce the time of the log applying process as well as reduce the number of logs, take a manual snapshot between automated backups. PITR automatically selects automatic or manual snapshots that are created near the PITR time. Intermediate snapshots might reduce the number of logs that the PITR process must apply. If you change several things, then take a manual snapshot every 3-4 hours. For more information, see Introduction to backups.
  • To prevent breaks or disruptions to the PITR process, use a wait_timeout value of 600 seconds. If you use a low wait_timeout value when you replay large transactions, then the PITR process might fail. For more information, see the wait_timeout section of Best practices for configuring parameters for Amazon RDS for MySQL and wait_timeout on the MySQL website.
  • If you use row-based binary logging, then set the binlog_row_image parameter value to MINIMAL instead of FULL. When you update this value, the size of binary logs is reduced and the binlog recovery time is minimized. For more information, see binlog_row_image on the MySQL website.
  • If you don't need a specific binlog format, then use the MIXED logging format. When you use mixed logging, statement-based logging is used by default and the logging mode switches automatically to row-based. Automatic switching helps to reduce binlog sizes. For more information, see Binary logging formats on the MySQL website.
AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago