Troubleshooting Redshift Spectrum query performance and errors using system logs and views.

5 minute read
Content level: Intermediate
7

The article lists various system logs and table that can be used to diagnose spectrum slowness and errors.

Introduction

Redshift Spectrum feature allows efficient query and retrieving structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. The article will include system tables and views which can be used to check spectrum performance for slowness and common errors that can be verified to resolve it.


Redshift Provisioned cluster


a. Please find the below troubleshooting queries to check spectrum performance:

  1. To get details about Amazon Redshift Spectrum queries at the segment and node slice level. This can provide insights on comparison of the scanned rows and returned rows in each segment, slice, step level for fast and slow query id.
select userid,query,segment,step,node,slice,ROUND(DATEDIFF(ms, MIN(starttime), MAX(endtime))::NUMERIC/1000,2) spectrum_s,SUM(elapsed) elapsed,SUM(s3_scanned_rows) s3_scanned_rows
,SUM(s3_scanned_bytes) s3_scanned_bytes
,SUM(s3query_returned_rows) s3query_returned_rows
,SUM(s3query_returned_bytes) s3query_returned_bytes
,SUM(files) s3_files
,SUM(s3_scanned_bytes) s3query_returned_bytes
FROM  svl_s3query where query =<query ID> GROUP BY 1,2,3,4,5,6 order by 2,3,4,5,6;
  1. To get details about Amazon Redshift Spectrum queries at the segment and node slice level and to check log level entry.
select query,segment,step,node,slice,eventtime, message from svl_s3log where query = <query ID> order by query,segment,slice;
  1. The below query can be used to check the files, splits, retries if any, parallelism, and if its partitioned at segment level.
select userid,query,xid,pid,starttime,endtime,external_table_name,is_partitioned,s3_scanned_rows,s3_scanned_bytes,s3query_returned_rows,files,files_max,files_avg,splits,splits_max,splits_avg,total_retries,max_retries,avg_request_duration, max_request_parallelism, avg_request_parallelism from svl_s3query_summary where query = <query ID> order by segment, step;

The recommendation for Redshift Spectrum is to have file size greater than 64MB and avoid data size skew by keeping files about the same size. Please do note ->small number of files limits the benefits of parallel processing. Number of data slices can be increased in scenarios where total file splits are more than avg_request_parallelism* number of slices present in the cluster. This can be confirmed from above SQL.

  1. To check for maximum file size and bucket information from svl_s3list and at the segment level. You may evaluate the average file size from svl_s3list and total number of files if there is a difference.
select query,segment,slice,node,bucket,count(distinct prefix) num_prefixes,min(eventtime) min_eventtime,max(eventtime) max_eventtime,SUM(retrieved_files) AS total_files,round(MAX(max_file_size)::float/1000/1000,2) AS max_file_mb,round(AVG(avg_file_size)::float/1000/1000,2) AS avg_file_mb from svl_s3list where query = <query ID> GROUP BY 1,2,3,4,5 order by 2;
  1. To to get details about Amazon Redshift Spectrum partitions at the segment and node slice level. Using svl_s3partition_summary you can get details of qualified_partitions.
select query, segment, node, slice, starttime, endtime, duration, total_partitions, qualified_partitions, assigned_partitions, assignment 
from svl_s3partition where query=<query ID> order by query, segment,node, slice;

select * from svl_s3partition_summary where query=<query ID> order by query,segment;

The above SQL shows effectiveness of partition pruning. When you partition your data, you can restrict the amount of data that Redshift Spectrum scans by filtering on the partition key. You can partition your data by any key. A common practice is to partition the data based on time. For example, you might choose to partition by year, month, date, and hour. If you have data coming from multiple sources, you might partition by a data source identifier and date.

In order to verify and view details for partitions in external tables.

select schemaname, tablename, count(*) from svv_external_partitions where tablename='table_name' group by 1,2;
  1. To check how much time was spent in scanning and filtering partitions from the data catalog and if there was time spent on high number of partitions for external table. Please note duration column is total time spent in microseconds requesting metadata from the data catalog.
select query,segment,node,slice,min(eventtime),max(eventtime),sum(call) as catalog_calls from svl_s3catalog where query= <query ID> group by 1,2,3,4 order by 1,2,3,4;

Comparison can be made for slow and fast queries if they use all partition columns of table in predicate but slow query can consume time due to catalog calls.

b. Please find the below troubleshooting queries to check spectrum errors:

  1. To check spectrum requests summary and Spectrum failed requests.
Select query,pid,segment,node,slice,eventtime,retry_count,returned_rows,returned_bytes,s3_scanned_rows,s3_scanned_bytes,row_groups,skipped_row_groups,skipped_rows,fetched,file_size,failed from svl_s3requests where query = <query ID> order by segment, slice, node;
  1. To check errors on “Spectrum Scan Error: Retries exceeded” you can use below system table to get information about why an Amazon Redshift Spectrum query based on Amazon S3 has failed.
select query,segment,node,slice,retries,successful_fetches,file_size,location,message from svl_s3retries where query = <query ID> order by eventtime,retries;
  1. To retrieve the complete error message related to errors like “Incompatible data formats” you can use query the below system view.
select query,segment,step,node,slice,eventtime, message from svl_s3log where query = <query ID> order by query,segment,slice;
  1. You can query the below system view to get information about Redshift Spectrum scan errors.
select userid,location,rowid,colname,original_value,modified_value,trigger,action,action_value,error_code from svl_spectrum_scan_error where query=<query ID>; 
  1. To check any errors S3 I/O failures that are happening on s3 bucket.
select http_method, bucket,error, count(*) from stl_s3client_error group by 1,2,3;

Redshift Serverless cluster

  1. To get details like the number of rows processed, number of bytes processed, and partition info of external tables in Amazon S3 and queries at a segment level.
SELECT query_id, segment_id, start_time, end_time, total_partitions, qualified_partitions, scanned_files, returned_rows, returned_bytes, trim(external_query_text) query_text, trim(file_location) file_location FROM sys_external_query_detail where query_id=<query ID> ORDER BY query_id, segment_id;
  1. To get information about Redshift Spectrum scan errors.
select * FROM sys_external_query_error where query_id=<query ID>;

Please find the below references which you can use additionally for troubleshooting spectrum performance and errors:

[+] Query troubleshooting in Amazon Redshift Spectrum

[+] Best Practices for Amazon Redshift Spectrum

[+] Troubleshooting query performance

profile pictureAWS
EXPERT
published 14 days ago206 views