What are the best practices to speed up VACUUM command in Redshift after significant data updates?


I'm working on a POC on Amazon Redshift with a 30 TB table on a 16-node RA3.4XL cluster. After significant data updates, I ran VACUUM BOOST on this table (having compound sort keys) that took around 10 hours to complete. The CPU utilization during the vacuum window was not higher than 30%. What are some of the best practices to make the VACUUM command run faster?

I'm concerned about this long downtime because the data updates on the table might need to be run periodically (for example, to meet GDPR compliance) and the VACUUM command must be run after the updates are complete.

질문됨 4년 전1825회 조회
1개 답변
수락된 답변

Running the VACUUM command in Amazon Redshift is a very resource-intensive task. Running VACUUM on large tables online with lots of unsorted/deleted rows is not recommended.

Consider the following options instead:

  1. [Unload to Amazon S3][1] and then copy back for huge tables. Perform an [elastic resize][2] on your cluster to to two times the node count before performing these steps and revert to the original cluster size after these operations are complete. [1]: https://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html [2]: https://aws.amazon.com/premiumsupport/knowledge-center/redshift-elastic-resize/

  2. Perform a deep copy, that is, create a new table and repopulate it using a bulk insert. Perform an elastic resize on your cluster before performing the deep copy.

  1. Create a manual snapshot and restore into a new cluster. When this operation is complete, you can choose to delete the original cluster and rename the new cluster. This might be the fastest option (around one to two hours) to remove large number of deleted rows.
답변함 4년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠