Amazon Redshift Serverless: Monitoring Queries and Troubleshooting Performance using system(SYS) views

12 minute read
Content level: Intermediate
2

This article provides guidance on various system views and queries that can help you monitor and troubleshoot your queries on Redshift Serverless.

Introduction

This article introduces key system tables that facilitate monitoring active queries on your Amazon Redshift Serverless, checking query status, gathering execution metrics, troubleshooting queries, and implementing billing controls and best practices for Serverless. By leveraging these system tables, you can gain valuable insights into query execution behavior and cost management strategies for your Redshift Serverless environment.

Queries to use for monitoring on Redshift Serverless:

  1. Get Active Queries from SYS_QUERY_HISTORY:
SELECT * FROM sys_query_history WHERE status IN ('planning', 'queued', 'running', 'returning') ORDER BY start_time;

This query retrieves all currently running and queued queries from the sys_query_history view, ordered by their start time. It helps you identify long-running queries or queued queries that may be waiting for resources.

  1. Checking Current active and recent Completed Queries from SYS_QUERY_DETAIL:
SELECT * FROM sys_query_history ORDER BY start_time desc limit 100;

This query retrieves all queries, both current and completed, from the sys_query_history view, sorted in descending order by their start time. It provides a comprehensive view of all query activities, including their status, execution time, and other relevant details.

  1. Check query execution details of a query at step level to get all query execution metrics.
SELECT * FROM sys_query_detail WHERE query_id IN (<query_id>) ORDER BY query_id,child_query_sequence,stream_id,segment_id,step_id;

The SYS_QUERY_DETAIL view retrieves detailed information about a specific query or set of queries identified by their query_id. It provides insights into the query execution plan, including child queries, streams, segments, steps, tables and data scanned, skewed information.

Key metrics to watch out for are the table_name, step_name, duration for each step, blocks_read, input and output rows and bytes, data skewness, disk spill, and any alerts related to the steps in the query. This information can be valuable for troubleshooting and optimizing complex or slow-running queries.

  1. Segment level execution details of a query:
select user_id, query_id, stream_id, segment_id, start_time, end_time, duration,table_name, step_name, alert,
blocks_read, blocks_write, local_read_blocks local_read_io, remote_read_blocks remote_read_io 
from sys_query_detail where query_id = <query_id> and step_id = -1 
order by query_id,stream_id, segment_id;

You can add multiple query ids to compare performance of a slow and fast query using SYS_QUERY_HISTORY and SYS_QUERY_DETAIL tables.

  1. Identifying Locking and Blocking Sessions on Serverless:
SELECT a.txn_owner user_id, a.txn_db database_name, a.xid transaction_id, a.pid session_id, a.txn_start start_time, a.lock_mode, a.relation table_id, nvl(trim(d.relname), '') AS table_name, a.granted, b.pid AS blocking_session_id, datediff(s, a.txn_start, getdate()) / 86400 || ' days ' || datediff(s, a.txn_start, getdate()) % 86400 / 3600 || ' hrs ' || datediff(s, a.txn_start, getdate()) % 3600 / 60 || ' mins ' || datediff(s, a.txn_start, getdate()) % 60 || ' secs' AS txn_duration
FROM svv_transactions a LEFT JOIN (SELECT pid, relation, granted FROM pg_locks GROUP BY 1, 2, 3) b 
ON a.relation = b.relation AND a.granted = 'f' AND b.granted = 't' LEFT JOIN pg_class d ON a.relation = d.oid 
WHERE a.relation IS NOT NULL;

This query identifies sessions that are holding locks and blocking other sessions waiting for a lock. It provides information about the user, database, transaction ID, session ID, lock mode, table involved, and the duration of the transaction. If you identify a blocking session holding a lock for an extended period, you can check if the associated transaction or query is active or left open using the sys_query_history view.

  1. Terminating Blocking Sessions:
SELECT pg_terminate_backend(<blocking_session_pid>);

If you've identified a blocking session that needs to be terminated, you can use this query to terminate the backend process associated with the blocking session. Replace <blocking_session_pid> with the actual process ID of the blocking session.

  1. Alerts for query tuning recommendations:
select user_id, query_id, stream_id, segment_id, step_id, step_name, 
table_id, trim(table_name) table_name, trim(alert) alert 
from sys_query_detail 
where query_id = <query_id> and trim(alert) is not null order by segment_id;
  1. Check workload by query type and counts:
select date_trunc('day', start_time) days, status, query_type, count(*) from sys_query_history group by 1, status, query_type  order by 1;
  1. To review daily summary of query execution metrics for a week.
SELECT 
    DATE_TRUNC('day', start_time) as exec_day,
    COUNT(*) as queries_count,
    ROUND(AVG(queue_time/1000000.0), 2) as avg_queuing_sec,
    ROUND(AVG(execution_time/1000000.0), 2) as avg_exec_sec,
    ROUND(AVG(compile_time/1000000.0), 2) as avg_compile_sec,
    ROUND(AVG(planning_time/1000000.0), 2) as avg_planning_sec,
    ROUND(AVG(lock_wait_time/1000000.0), 2) as avg_lock_wait_sec,
    ROUND(AVG(elapsed_time/1000000.0), 2) as avg_elapsed_sec,
    SUM(returned_rows) as sum_ret_rows,
    SUM(returned_bytes) as sum_ret_bytes,
    trim(redshift_version) as redshift_version
FROM sys_query_history
WHERE start_time > DATEADD(day, -7, CURRENT_DATE) AND user_id <> 1
GROUP BY exec_day,redshift_version
ORDER BY exec_day;

This query provides daily performance analysis query that tracks Redshift database metrics over the last week, showing average processing times, query counts, data volume, and version information, helping monitor database efficiency and workload patterns.

  1. Retrieve Table information details like size, rows, dist and sort key info, unsorted and stats_off etc and run VACUUM and ANALYZE wherever required.
  • Table details:
SELECT * FROM svv_table_info;
  • Analyze History(Statistics):
SELECT * FROM sys_analyze_history order by start_time desc;
  • Vacuum history:
SELECT * FROM sys_vacuum_history order by start_time desc;

These queries provide information about table statistics, analyze operations, and vacuum operations. Monitoring table statistics is crucial for maintaining optimal query performance, as outdated or missing statistics can lead to inefficient query execution plans.

  1. Monitoring and Troubleshooting Load and Unload Operations:
SELECT * FROM sys_load_history WHERE query_id=<query_id> ORDER BY start_time desc;
SELECT * FROM sys_load_error_detail WHERE query_id=<query_id> ORDER BY start_time desc;
SELECT * FROM sys_unload_history ORDER BY query_id, start_time;

These queries provide detailed information about load and unload operations, including their status, start time, and any associated errors. Replace <query_id> with the actual query ID of the load or unload operation you want to monitor or troubleshoot.

  1. Summary of Load and Unload operations on daily basis.

COPY or Load jobs:

SELECT
    date_trunc('day', start_time) AS exec_day,
    count(query_id) as no_of_queries,
    ROUND(AVG(duration / 1000000.0), 2) as avg_duration_sec,
    SUM(loaded_rows) AS loaded_rows,
    SUM(loaded_bytes) AS loaded_bytes
FROM sys_load_history
GROUP BY exec_day
ORDER BY exec_day DESC;

Unload or Export jobs:

SELECT
    date_trunc('day', start_time) AS exec_day,
    count(query_id) as no_of_queries,
    ROUND(AVG(duration / 1000000.0), 2) as avg_duration_sec,
    SUM(unloaded_rows) AS sum_unloaded_rows,
    SUM(unloaded_files_size) AS sum_unloaded_files_size
FROM sys_unload_history exec_day
ORDER BY exec_day DESC;
  1. Monitoring external (federated and Spectrum) queries and troubleshooting errors:
SELECT * FROM sys_external_query_detail WHERE query_id=<query_id> ORDER BY query_id,segment_id, start_time;
SELECT * FROM sys_external_query_error;

This query provides segment level information for external queries and details like the number of rows processed, number of bytes processed, and partition info of external tables in Amazon S3 and Postgres.

  1. To review daily summary of Spectrum or external query execution metrics for a week.:
SELECT 
    DATE_TRUNC('day', start_time) as exec_day,
    COUNT(DISTINCT query_id) as total_external_or_spectrum_queries,
    ROUND(AVG(duration/1000000.0), 2) as avg_exec_sec,
    SUM(total_partitions) as sum_total_partitions,
    SUM(qualified_partitions) as sum_qualified_partitions,
    SUM(scanned_files) as sum_scanned_files,
    SUM(returned_rows) as sum_returned_rows,
    ROUND(SUM(returned_bytes)/(1024*1024*1024.0), 2) as sum_returned_gb    
FROM sys_external_query_detail
WHERE start_time > DATEADD(day, -7, CURRENT_DATE)
GROUP BY 1
ORDER BY 1;
  1. Monitoring Datasharing objects and its usage:
SELECT * FROM SYS_DATASHARE_CHANGE_LOG;
SELECT * FROM SYS_DATASHARE_USAGE_CONSUMER;
SELECT * FROM SYS_DATASHARE_USAGE_PRODUCER;

These Views records the consolidated view for tracking changes to data shares on both producer and consumer clusters and Records the activity and usage of Datashares on producer and Consumer.

  1. Monitoring Streaming ingestion queries:
SELECT * FROM SYS_STREAM_SCAN_STATES ORDER BY record_time DESC;
SELECT * FROM SYS_STREAM_SCAN_ERRORS ORDER BY record_time DESC;

Records scan states for records loaded via streaming ingestion and related error information.

  1. Monitoring Sessions and Connections:
  • Current active sessions
SELECT * FROM sys_session_history where status='active' order by start_time;
  • Current active sessions by user.
SELECT user_id,count(*) FROM sys_session_history where status='active' and user_id<>1 group by 1 order by 2 desc; 
  • Connection history for all user sessions.
SELECT * FROM sys_connection_log where session_id=<session_id> ORDER BY record_time desc; 

These queries provide information about user sessions and connections, including session history and connection logs. Replace <session_id> with the actual session ID you want to investigate for a specific session.

  1. Monitoring Transactions:
select * from sys_transaction_history order by transaction_start_time desc;

This query retrieves information about transaction history, including their start time, end time, and other relevant details. It can help you identify long-running transactions or potential issues related to transaction management.

  1. Monitoring User Activities:
Select * from SYS_USERLOG order by record_time desc;

This query provides a log of user activities, such as create, drop and Alter user properties.

  1. To get the database Table Access Frequency Analysis by User and Operation Type:
SELECT 
    u.usename as user_name,
    qh.query_type as query_type,
    qd.table_name as table_name,
    max(qd.end_time) as last_accessed_timestamp,
    COUNT(*) as operation_count
FROM 
    sys_query_history qh
INNER JOIN 
    pg_user u ON qh.user_id = u.usesysid
INNER JOIN 
    SYS_QUERY_DETAIL qd ON qh.query_id = qd.query_id
Where qd.table_name<>'' and qd.table_name is not NULL-- and ilike '%table_name%'
GROUP BY 
    u.usename,
    qh.query_type,
    qd.table_name
ORDER BY 
    operation_count DESC;
  1. Monitor compute usage:
select date_trunc('day',start_time) as days, SUM(compute_seconds) from sys_serverless_usage
group by days
order by 1 desc
  1. Billing and Cost Monitoring:
select trunc(start_time) "Day", 
(sum(charged_seconds)/3600::double precision) * <Price for 1 RPU> as cost_incurred_USD
from sys_serverless_usage 
group by 1 
order by 1;

This query calculates the cost incurred for Redshift Serverless usage on a daily basis. Replace <Price for 1 RPU> with the actual price per RPU (Redshift Processing Unit) hour by referring Redshift pricing for Serverless for respective AWS region. It helps you monitor and manage your billing effectively.

  1. Identifying Long-Running Queries:
WITH cte AS (
    SELECT
        date_trunc('day', start_time) AS query_day,
        user_id,
        query_text,
        ROUND(execution_time / 1000000.0, 2) AS execution_time_seconds,
        ROW_NUMBER() OVER (PARTITION BY date_trunc('day', start_time) ORDER BY execution_time DESC) AS rn
    FROM
        sys_query_history
    WHERE
        start_time >= dateadd('day', -7, current_date) -- Adjust the date range as needed
)
SELECT
    query_day,
    user_id,
    query_text,
    execution_time_seconds
FROM
    cte
WHERE
    rn <= 10
ORDER BY
    query_day DESC,
    execution_time_seconds DESC;

This query identifies the top 10 longest-running queries for each day within the specified date range. It provides the query text, user ID, and execution time in seconds, which can help optimize poorly performing queries. However, the 60-second minimum billing is for the entire Redshift serverless warehouse based on RPU usage seconds, not per query. The primary factor influencing Serverless pricing is the RPU (Resource Processing Unit) usage, although there are other elements contributing to the overall Redshift Serverless billing, some of which are highlighted below.

To maintain data tables hygiene and improve performance, follow these best practices:

  • Periodically run VACUUM on tables with high vacuum_sort_benefit (>10) to enhance query performance.
    • select * from svv_table_info where unsorted > 10 order by vacuum_sort_benefit desc limit 30;
SELECT 
    schema, "table", unsorted,  tbl_rows,vacuum_sort_benefit,
    CASE 
        WHEN unsorted > 10 THEN 'Run: VACUUM ' || schema || '.' || "table" || ';'
        WHEN unsorted <= 10 THEN 'Ok'
        ELSE 'No Value'  
    END AS Analysis
FROM svv_table_info
WHERE schema <> 'pg_internal'
ORDER BY vacuum_sort_benefit DESC, unsorted desc;
  • Periodically run ANALYZE on tables with high stats_off values (>10) to update statistics and generate optimal plans.
    • select * from svv_table_info where stats_off > 10 order by size,stats_off desc limit 30;
SELECT schema, "table",stats_off, tbl_rows,
    CASE 
        WHEN stats_off > 10 THEN 'Run: ANALYZE ' || schema || '.' || "table" || ';'
        WHEN stats_off <= 10 THEN 'Ok'
        ELSE 'No Value'  
    END AS Analysis
FROM svv_table_info
WHERE schema <> 'pg_internal'
ORDER BY tbl_rows DESC, stats_off desc;
  • Review tables with large VARCHAR columns, inefficient sort keys, or encrypted columns as the first sort key column using SVV_TABLE_INFO, as these can impact query performance.
  • Regularly review alerts and recommendations from the Redshift Advisor and implement them to optimize cluster performance.

Best practices to control Cost measures on Redshift Serverless:

Cost Control Measures:


  • Amazon Redshift Serverless billing is based on RPU (Redshift Processing Unit) hours per second, charged only when queries are running and not when idle.

  • Storage is billed separately based on the amount of data stored in Redshift, charged per GB per month.

  • There is a minimum charge of 60 seconds for queries using Serverless, even if they complete in less time and metered on a per-second basis beyond 60 seconds.

  • You can set usage limits (in RPU hours) at the workgroup level to control costs. Actions can be specified when the limit is reached, such as logging, sending alerts, or turning off user queries.

  • For cross-region data sharing, limits can be set in terms of data volume (TB).

  • Query Monitoring Rules (QMR) can be used to control bad queries or those consuming excessive resources by setting limits on execution time, queue time, rows joined, and nested loop join row counts.

  • Canceled queries are still billed for the time the query ran if you abort before it finishes.

Best Practices:


  • Minimize open transactions by properly ending them with the END statement. Open transactions can generate unnecessary RPU usage.

  • Use the SESSION TIMEOUT setting to automatically end open transactions and idle sessions after a specified period of inactivity.

  • Amazon Redshift Serverless has a maximum query runtime of 24 hours and a maximum open transaction period of 6 hours before automatically ending the associated session.

By leveraging these SYS Monitoring Views, SYS Mapping views and provided queries , you can effectively monitor and troubleshoot your queries on Amazon Redshift Serverless.

profile pictureAWS
SUPPORT ENGINEER
published 9 months ago2.9K views