Introduction
The article will include system tables and views which can be used to check copy/unload performance for slowness and common errors that can be verified to resolve it.
Redshift Provisioned cluster
a. Queries to check slowness in Copy and Unload queries with comparison between slow/fast queries:
- If the copy issue is live please find the current state of ongoing COPY statements. It can help monitor the progress of loading of records.
select * from stv_load_state where query=<Query ID> order by recordtime,slice;
- You may use the below query to check and find out which phase of copy took time.
Select userid,xid,pid,trim(label) as trim_label,starttime,endtime,sequence,trim(type) as trim_type,left(text,60) as left_text from svl_statementtext where xid =<xid-transaction id> order by starttime,sequence;
- To find the time spent transferring data from Amazon S3 and to know number of files processed per slice. You may identify if all slices are fetching almost same number of files from S3 and each file has same amount of data. You can use to check both copy and unload queries.
select userid,query,slice,recordtime,pid,trim(http_method) as http_method,trim(bucket) as bucket,trim(key) as key ,transfer_size,data_size,transfer_time,compression_time,connect_time,retries from stl_s3client where query = <Query ID> order by recordtime,slice;
You may check the number of files with slice distribution and check for any skewness:
select slice, count(1) from stl_s3client where query=<Query ID> group by 1 order by 1;
- To track the progress of each data file as it is loaded into a database table.
select * from stl_load_commits where query=<Query ID>;
To check the count of files while comparing the queries:
select query, count(distinct filename) from stl_load_commits where query = <Query ID> group by query order by query;
- To check the load time of any files parallel data loads using the COPY command.
select userid, query,slice,name,lines,bytes,loadtime from STL_FILE_SCAN where query = <Query ID> order by query,slice;
- To check details for an unload operation and for log analysis on the slices processing the s3 file path.
select userid,query,path,start_time,end_time,line_count,transfer_size,file_format from STL_UNLOAD_LOG where query = <Query ID> order by start_time,slice;
- To check the query duration and execution report you may check the below query for copy/unload.
SELECT * FROM stl_query WHERE query=<Query ID>;
select * from svl_query_report where query = <Query ID> order by segment, step, elapsed_time, rows;
b. Please find the below queries to check for load errors and character replacement during copy/unload::
- To find details of the failure of load error explanation code. You can query STL_LOADERROR_DETAIL for additional details, such as the exact data row and column where a parse error occurred.
select * from stl_load_errors where query = <Query ID>;
select userid,slice,query,filename,line_number,field,colname,value,is_null,type,col_length from STL_LOADERROR_DETAIL where query = <Query ID>;
- To find details for errors encountered while transferring data from Amazon S3 as part of a COPY command.
select userid,query,sliceid,recordtime,http_method,bucket,key,error,is_partial,start_offset from STL_S3CLIENT_ERROR where query = <Query ID> order by query, sliceid;
- If you are using ACCEPTINVCHARS option in copy command it will display a log that records when invalid UTF-8 character were replaced.
select query, session, filename, line_number, colname from stl_replacements where query =<Query ID>;
Redshift Serverless/provisioned cluster and system views to check user query id details :
- To find the details of the copy or the unload query.
SELECT * FROM sys_query_history where query_type = 'COPY/UNLOAD' and query_id=<Query ID>;
- To view details of COPY commands and to track the progress of each data file as it is loaded into a database table.
SELECT user_id,query_id,database_name,status,table_name,start_time,end_time,data_source,file_format,loaded_rows,loaded_bytes,source_file_count,source_file_bytes,file_count_scanned,file_bytes_scanned FROM sys_load_history WHERE query_id=<Query ID> ORDER BY query_id;
SELECT query_id, trim(file_name) as file, record_time,bytes_scanned,lines_scanned,splits_scanned FROM sys_load_detail WHERE query_id=<Query ID> ORDER BY query_id;
- To view details of UNLOAD commands. Each row represents a UNLOAD command with accumulated statistics for some of the fields. It contains both running and finished UNLOAD commands.
SELECT query_id,file_format,start_time,end_time,duration,unloaded_rows,unloaded_files_count,error_message FROM sys_unload_history WHERE query_id=<Query ID> ORDER BY query_id;
- To view details of UNLOAD commands. Each row represents a UNLOAD command with accumulated statistics for some of the fields. It contains both running and finished UNLOAD commands.
SELECT query_id,file_format,start_time,end_time,duration,unloaded_rows,unloaded_files_count,error_message FROM sys_unload_history WHERE query_id=<Query ID> ORDER BY query_id;
- To view the unloaded query details, including format, rows, and file count of unload command.
select query_id, substring(file_name, 0, 50), transfer_size, file_format from sys_unload_detail WHERE query_id=<Query ID>;
- To records the progress of each data file as it is loaded into a database table similar to stl_load_errors you may use sys_load_error_detail.
select query_id,trim(file_name) AS file_name,trim(column_name) AS column_name, trim(column_type) AS column_type, trim(error_message) AS error_message from sys_load_error_detail where query_id=<Query ID>;
- The below sys view is similar to stl_replacements.
select query_id, table_id, file_name, line_number, column_name from sys_copy_replacements where query_id=<Query ID>;
Please find the below references which you can use additionally for troubleshooting copy and unload queries:
[+] Troubleshooting data loads
[+] Amazon Redshift best practices for loading data