How do I avoid the long point-in-time recovery period of my RDS for MySQL DB instance?

4 minute read
0

I want to avoid a long wait for my point-in-time recovery (PITR) of a DB instance in Amazon Relational Database Service (Amazon RDS) for MySQL.

Short description

When you perform a PITR, you restore a DB instance to a specific point in time and create a new DB instance. When you initiate a PITR, you can either restore from the Latestrestorabletime or restore from a specific date and time. The most recent automated or manual backup that occurred before the specified time is restored. The transactional logs are then applied to transition the database to the PITR time.

The following factors affect PITR time:

  • The required time to perform the volume restore.
  • The quantity and size of transaction logs that existed between the time of the last automated snapshot and the PITR API call.
  • The applied contents of the transaction logs that are part of the instance recovery.
  • The types of DB instance classes. Different instance classes might have different I/O throughput that can affect log applying performance.

The following example scenarios cause PITR issues:

  • The DB instances are in the storage full state. For more information, see Working with backups.
  • You use MyISAM tables in MySQL. For more information, see Automated backups with unsupported MySQL storage engines.
  • You use temporary tables, in-memory tables, and distributed transactions. For more information, see Diagnosing and resolving point-in-time restore failures.
  • If you set sync_binlog to 0, then the performance overhead that's required to sync the binary logs might be reduced. Also, when there's a power failure or operating system (OS) crash, some commits might not sync to the binary logs. This asynchronization might affect PITR capabilities. It's a best practice to set sync_binlog to 1 so that you don't lose events from binary logs. For more information, see sync_binlog on the MySQL website.

Note: If you can't connect to your instance after the PITR, then check your security group. When you restore a DB instance to a point in time, the default DB security group is applied to the new DB instance.

If you need to apply your custom DB security groups to your new instance, then use one of the following methods:

Resolution

To avoid a long PITR period, use the following best practices:

  • Create a disaster recovery strategy.
  • Use smaller transactions, and frequently run the COMMIT command.
  • When you run a large transaction, take a snapshot before and after the large transactions. Transactions that are larger than the max_allowed_packet parameter cause PITR to fail. For more information, see max_allowed_packet on the MySQL website.
  • Minimize snapshot restore times. Snapshot restores are initiated as part of the PITR process. A longer snapshot restore might contribute to a longer PITR session. For more information, see Why is it taking so long to restore a snapshot of my Amazon RDS for MySQL DB instance?
  • The log applying process can take more time based on the number of logs to apply. To reduce the number of logs to apply, take a manual snapshot between the automated backups. PITR automatically selects automatic or manual snapshots that are created near the PITR time. Intermediate snapshots might reduce the number of logs to apply. If you're making several changes, then take a manual snapshot every 3-4 hours. For more information, see Backup window.
  • If you replay any large transactions, then a low wait_timeout value might break the PITR processes in Amazon RDS for MySQL. For example, breaks occur when you're performing a large row-based bulk update, insert, or delete and the replay takes longer than the wait_timeout. To prevent breaks or disruptions in the PITR process, set the wait_timeout value to 600 (10 minutes) or more. For more information, see the wait_timeout section of Best practices for configuring parameters for Amazon RDS for MySQL and wait_timeout on the MySQL website.
  • If you use row-based binary logging, then set the binlog_row_image parameter value to MINIMAL, not FULL. When you update this value, the size of binary logs is reduced and the binlog recovery time is minimized. For more information, see binlog_row_image on the MySQL website.
  • If you don't need a specific binlog format, then use the MIXED logging format. When you use mixed logging, statement-based logging is used by default and the logging mode switches automatically to row-based. Automatic switching helps to reduce binlog sizes. For more information, see Binary logging formats on the MySQL website.
AWS OFFICIAL
AWS OFFICIALUpdated 8 months ago