Redshift COPY command CPU spikes

0

A customer is running into the CPU spikes (it goes to almost 100%) on all compute nodes, while running COPY command to load data into Redshift cluster from S3. They are running a cluster of 2 ds2.xlarge nodes which gives total of 4 slices to work in parallel. After noticing the spike, they added a third node to the cluster, but still seeing the same behavior.

The files being copied are csv, in the compressed GZIP format.

Anyone seen this issue before? Any pointers will be much helpful, thanks!

已提問 7 年前檢視次數 748 次
1 個回答
0
已接受的答案

While proper splitting of files is very important and highly recommended, it shouldn't cause a CPU spike across the cluster.

What is usually the cause of a CPU spike like what you're describing is if you are loading into a table without any compression settings. The default setting for COPY is that COMPUPDATE is ON. What happens is that Redshift will take the incoming rows, run them through every compression setting we have and return the the appropriate (smallest) compression.

To fix the issue, it's best to make sure that compression is applied to the target table of the COPY statement. Run Analyze Compression command if necessary to figure out what the compression should be and manually apply it to the DDL. For temporary tables LZO can be an excellent choice to choose because it's faster to encode on these transient tables than say ZSTD. Just to be sure also set COMPUPDATE OFF in the COPY statement.

AWS
專家
已回答 7 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南