- Newest
- Most votes
- Most comments
Yes, it is possible. Briefly when using DISTSTYLE ALL (as in the initial stage of AUTO), Redshift needs to make a replica of the table on the 1st slice of each node (NOT on all slices). If you have 2 nodes, the space will be 2x, etc. Yes, during COPY, all the data files need to be uncompressed, columns need to be encoded/compressed (using temp space) and often sorted (more temp space) before they can be written out to disks. Assuming a Parquet/Snappy compression ratio of 3, this is an extra 3x factor, etc.
It is useful to keep a key metric in mind: A single Redshift slice can easily load >5MB/sec/slice or ~18 GB/sec/slice (regardless of node type). Data size is RAW! For large node types with 16 slices, you should expect ~200 GB/node at a minimum. If your COPY time becomes significantly longer (than expected), it is a good indication that something bad (like space problem) has occurred. Here is a useful query that you can use to help troubleshooting "DISK FULL" issues:
select '2000-01-01'::timestamp + (currenttime/1000000.0)* interval '1 second' as currenttime,node_num,query_id,temp_blocks from pg_catalog.stl_disk_full_diag;
Recommendations:
-
Change DISTSTYLE to EVEN or KEY (if a good DISTKEY can be found). Starting with EVEN is fine.
-
Load a small set of files into the target table. Objective is to get a better idea of snappy compression and also performance.
-
To speed things up, use a bigger cluster (Classic Resize). Another quicker option is to Elastic Resize (ER) 2x of your cluster. When done and validated, you can ER back.
Two useful references:
-
Loading data from Amazon S3: https://docs.aws.amazon.com/redshift/latest/dg/t_Loading-data-from-S3.html
-
How can I troubleshoot high or full disk usage with Amazon Redshift?: https://aws.amazon.com/premiumsupport/knowledge-center/redshift-high-disk-usage/
Relevant content
- asked 23 days ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 10 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago