I want to load large volumes of data from Amazon Simple Storage Service (Amazon S3) into Amazon Redshift and maintain optimal performance.
Short description
To add data from Amazon S3 to your Amazon Redshift tables, You can use the INSERT or COPY command. However, It's a best practice to use the COPY command because it's more efficient. COPY uses massively parallel processing (MPP) architecture to simultaneously load data from files in an S3 bucket.
For other methods to use, see Loading data in Amazon Redshift.
Resolution
Run one COPY command
Run only one COPY command to load one table. If you use multiple concurrent COPY commands to load one table from multiple files, then Amazon Redshift performs a serialized load and might slow down.
Prepare data files
To efficiently use the COPY command, you must also prepare the data files. The number of files and their size affects parallel processing capabilities.
If you specify multiple files in the COPY command, then Amazon Redshift loads the data in parallel. To prepare your data files, make sure that they are approximately the same size, with a range from 1 MB to 1 GB.
For a Redshift provisioned cluster, it's a best practice to use a number of files that's a multiple of the number of slices. For Amazon Redshift Serverless, it's a best practice to use a number files that's a multiple of 128. For more information, see Loading data files.
To get the number of slices for a Redshift provisioned cluster, run the following query:
select count(*) from STV_SLICES where type = 'D';
Note: When you specify one uncompressed .csv, Parquet, or ORC file in the COPY command, Amazon Redshift splits files that are larger than 128 MB.
Monitor the COPY command status
To view the COPY command status, you can check the system tables.
The system tables record the COPY command details, including the number of files, the file size, and the number of loaded rows. You can monitor the tables to determine performance improvements that you can make and the cause of a COPY command failure.
For more information about the COPY command, see SYS_LOAD_HISTORY and SYS_LOAD_DETAIL.
For more information about COPY command errors, see SYS_LOAD_ERROR_DETAIL.
Related information
Amazon Redshift best practices for loading data
Loading tables with the COPY command
Loading data from compressed and uncompressed files