What are the best practices to speed up VACUUM command in Redshift after significant data updates?


I'm working on a POC on Amazon Redshift with a 30 TB table on a 16-node RA3.4XL cluster. After significant data updates, I ran VACUUM BOOST on this table (having compound sort keys) that took around 10 hours to complete. The CPU utilization during the vacuum window was not higher than 30%. What are some of the best practices to make the VACUUM command run faster?

I'm concerned about this long downtime because the data updates on the table might need to be run periodically (for example, to meet GDPR compliance) and the VACUUM command must be run after the updates are complete.

已提问 4 年前1823 查看次数
1 回答

Running the VACUUM command in Amazon Redshift is a very resource-intensive task. Running VACUUM on large tables online with lots of unsorted/deleted rows is not recommended.

Consider the following options instead:

  1. [Unload to Amazon S3][1] and then copy back for huge tables. Perform an [elastic resize][2] on your cluster to to two times the node count before performing these steps and revert to the original cluster size after these operations are complete. [1]: https://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html [2]: https://aws.amazon.com/premiumsupport/knowledge-center/redshift-elastic-resize/

  2. Perform a deep copy, that is, create a new table and repopulate it using a bulk insert. Perform an elastic resize on your cluster before performing the deep copy.

  1. Create a manual snapshot and restore into a new cluster. When this operation is complete, you can choose to delete the original cluster and rename the new cluster. This might be the fastest option (around one to two hours) to remove large number of deleted rows.
已回答 4 年前

您未登录。 登录 发布回答。