By using AWS re:Post, you agree to the Terms of Use

Redshift nodes: 14 x dc2.large vs 4 x ra3.xlplus - COPY performance drop


We recently switched our Redshift cluster from 14 x dc2.large to 4 x ra3.xlplus as we really liked the idea to have the storage decoupled from the nodes.

I expected some kind of change in performance since we're having now 16vCPU vs 28vCPU and 210GB ram vs 128GB. However, first tests showed an massive decrease in loading times. First of all, the bucket contains still the same amount of files: ~500'000. Resulting in 720'000 records. We are aware that this could be improved, but we want to compare the performance:

14 x dc2.large: 15min 4 x ra3.xlplus: 42min

For both loads, no other queries were running. Exactly the same COPY statement (STATUPDATE & COMPUPDATE off). CPU utilization shows that all 4 nodes are at max 50%. WLM is set to automatic with the current user executing to priority high.

Same performance drop can be measured in bigger join operations etc.

Anyone else facing the same experience? Do those ra3 nodes need some time to make the magic (like distinguish between cold & hot etc) happen?

3 Answers

With the AWS node conversion recommendations, I've mostly read that for DC2 instances the performance kind of remains same with the RA3. But in your case, looks like you almost cut the capacity in half while opting to RA3. Is this AWS recommended config?

answered 10 months ago

The documentation states the following:

"Create 3 nodes of ra3.xlplus for every 8 nodes of dc2.large"

So we would end up with 5.2 ra3.xlplus nodes. But as those nodes arent cheap (10k/y) the decision wasnt easy. The documentation states also that they improved S3 loads in comparison to dc/ds nodes. But considering loading times resulting in 3x more(!) and simple joins taking double amount i'm quite speechless. Thats why i'm looking for other experiences - might be that the performance will increase over time. We started the new cluster yesterday, so its fresh and much loads are running for the first time now on the ra3 cluster.

answered 10 months ago

Hi, Thank you for your patience.

For the load issue, I think that what you are experiencing is normal, considering the specific types of files you are loading (more than 500k files with a few rows each).

In this case the better throughput between S3 and Redshift would not help, because the most probable delay would come from the opening of all the files not the time to move few rows.

ra3.xlplus instances have the same number of slices of dc2.xlarge, and each slice will load one file at the time, so based on the number of nodes you shared you have, now, almost one third of the slices, so each will have a longer queue of files to load.

Also, ra3.xlplus have 2 cores per slice, but only one will be used by the load operation , thus the 50% utilization you see.

Definitely, as you already know, the optimization/consolidation of the input files would be beneficial for both clusters.

As for the join slow down , it is not possible to provide any feedback without seeing the explains in both clusters, nor knowing how much data is read during the query, nor having an idea of the table design.

could you provide additional information or open a support case (if not already done)?

hope this helps.

answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions