- Newest
- Most votes
- Most comments
This issue appears to be related to how Amazon Redshift distributes data across slices after scaling your cluster. When you scale a Redshift cluster, the data distribution doesn't automatically rebalance across all slices optimally.
Here's what's happening and what you can try:
-
Data Distribution Skew: Even though you have 8 slices (2 per node) in your 4-node ra3.xlplus cluster, your data is not being evenly distributed across all slices. This is causing CPU imbalance with older nodes working harder than newer ones.
-
VACUUM and ANALYZE: Run a full vacuum and analyze on your tables. This can help redistribute data and update statistics:
VACUUM FULL;
ANALYZE;
- Check Data Distribution: Run a diagnostic query to identify tables with data skew:
select trim(pgn.nspname) as schema,
trim(a.name) as table, id as tableid,
decode(pgc.reldiststyle,0, 'even',1,det.distkey ,8,'all') as distkey,
dist_ratio.ratio::decimal(10,4) as skew,
det.head_sort as "sortkey",
det.n_sortkeys as "#sks", b.mbytes,
decode(b.mbytes,0,0,((b.mbytes/part.total::decimal)*100)::decimal(5,2)) as pct_of_total,
decode(det.max_enc,0,'n','y') as enc, a.rows,
decode( det.n_sortkeys, 0, null, a.unsorted_rows ) as unsorted_rows ,
decode( det.n_sortkeys, 0, null, decode( a.rows,0,0, (a.unsorted_rows::decimal(32)/a.rows)*100) )::decimal(5,2) as pct_unsorted
from (select db_id, id, name, sum(rows) as rows,
sum(rows)-sum(sorted_rows) as unsorted_rows
from stv_tbl_perm a
group by db_id, id, name) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
left outer join (select tbl, count(*) as mbytes
from stv_blocklist group by tbl) b on a.id=b.tbl
inner join (select attrelid,
min(case attisdistkey when 't' then attname else null end) as "distkey",
min(case attsortkeyord when 1 then attname else null end ) as head_sort ,
max(attsortkeyord) as n_sortkeys,
max(attencodingtype) as max_enc
from pg_attribute group by 1) as det
on det.attrelid = a.id
inner join ( select tbl, max(mbytes)::decimal(32)/min(mbytes) as ratio
from (select tbl, trim(name) as name, slice, count(*) as mbytes
from svv_diskusage group by tbl, name, slice )
group by tbl, name ) as dist_ratio on a.id = dist_ratio.tbl
join ( select sum(capacity) as total
from stv_partitions where part_begin=0 ) as part on 1=1
where mbytes is not null
order by mbytes desc;
- Deep Copy with DISTSTYLE AUTO: For tables showing significant skew, try recreating them with DISTSTYLE AUTO instead of EVEN or specific DISTKEY. This lets Redshift choose the optimal distribution style:
CREATE TABLE new_table DISTSTYLE AUTO AS SELECT * FROM original_table;
-
Batch Processing for Large Tables: For very large tables, consider using a batched approach for the deep copy operations to improve performance.
-
Check for Missing Statistics: Run this query to identify tables with missing statistics that might be affecting distribution:
select substring(trim(plannode),1,100) as plannode, count(*)
from stl_explain
where plannode like '%missing statistics%'
group by plannode
order by 2 desc;
If these steps don't resolve the issue, you may need to contact AWS Support as there could be an underlying configuration issue with your cluster after scaling.
Sources
Redshift diagnostics queries to identify the performance bottleneck | AWS re:Post
Issue distributing data among nodes | AWS re:Post
Data distribution for query optimization - Amazon Redshift
Improve COPY ingestion performance for large data loads on Amazon Redshift | AWS re:Post
Relevant content
- asked 7 months ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 3 months ago
