如何将查询计划与 Amazon Redshift 中的查询报告关联?

5 分钟阅读
0

我想将查询计划与 Amazon Redshift 集群中的查询报告关联。

简短描述

要确定在 Amazon Redshift 中运行查询所需的使用量,请运行 EXPLAIN 命令。EXPLAIN 命令返回的执行计划概括介绍了所涉及的查询计划和执行步骤。然后,使用 SVL_QUERY_REPORT 系统视图,查看集群切片级别的查询信息。您可以使用切片级信息,以检测集群中可能影响查询性能的不均匀数据分配。

Amazon Redshift 处理查询计划,并将该计划转换为步骤、分段和流。有关更多信息,请参阅查询计划和执行工作流程

解决方案

创建表并获取查询的执行计划和 SVL 查询报告

如需创建表并获取执行计划和 SVL 查询报告,请完成下面的步骤:

  1. 创建两个具有不同排序键和分配键的表

  2. 在未对分配键执行联接操作的情况下运行下列查询:

    select eventname, sum (pricepaid) from sales, event where sales.eventid = event.eventid group by eventname order by 2 desc;

    此查询会将内部表分配给所有计算节点。

  3. 检索查询计划:

    EXPLAIN <query>;
                                                   QUERY PLAN                                               
    --------------------------------------------------------------------------------------------------------
    XN Merge  (cost=1002815368414.24..1002815368415.67 rows=571 width=27)
       Merge Key: sum(sales.pricepaid)
       ->  XN Network  (cost=1002815368414.24..1002815368415.67 rows=571 width=27)
             Send to leader
             ->  XN Sort  (cost=1002815368414.24..1002815368415.67 rows=571 width=27)
                   Sort Key: sum(sales.pricepaid)
                   ->  XN HashAggregate  (cost=2815368386.67..2815368388.10 rows=571 width=27)
                         ->  XN Hash Join DS_BCAST_INNER  (cost=109.98..2815367496.05 rows=178125 width=27)
                               Hash Cond: ("outer".eventid = "inner".eventid)
                               ->  XN Seq Scan on sales  (cost=0.00..1724.56 rows=172456 width=14)
                               ->  XN Hash  (cost=87.98..87.98 rows=8798 width=21)
                                     ->  XN Seq Scan on event  (cost=0.00..87.98 rows=8798 width=21)
    (12 rows)
  4. 运行 SVL_QUERY_REPORT 查询,以获取查询报告:

    select * from svl_query_report where query = query_id order by segment, step, elapsed_time, rows;

    **注意:**请将 query_id 替换为您的查询的 ID。

将查询计划与查询报告映射

若要将查询计划与查询报告映射,请完成下面的步骤:

  1. 运行下面的查询,获取分段值为 0 的查询的 svl_query_report
    select query,slice,segment,step,start_time,end_time,elapsed_time,rows,bytes,label from svl_query_report where query = 938787 and segment = 0 order by segment, step, elapsed_time, rows;
    
    EXPLAIN <query>;
    ->  XN Hash  (cost=87.98..87.98 rows=8798 width=21)
       ->  XN Seq Scan on event  (cost=0.00..87.98 rows=8798 width=21)
    下面是输出示例:
    query  | slice | segment | step |         start_time         |         end_time          | elapsed_time | rows | bytes  |            label              
    -------+-------+---------+------+----------------------------+---------------------------+--------------+------+--------+------------------------------
    938787 |     0 |       0 |    0 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 |         1561 | 4383 | 128626 | scan   tbl=278788 name=event
    938787 |     1 |       0 |    0 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 |         1561 | 4415 | 128918 | scan   tbl=278788 name=event
    938787 |     0 |       0 |    1 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 |         1561 | 4383 |      0 | project                     
    938787 |     1 |       0 |    1 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 |         1561 | 4415 |      0 | project                     
    938787 |     0 |       0 |    2 | 2020-05-22 11:11:48.828309 | 2020-05-22 11:11:48.82987 |         1561 | 4383 | 126660 | bcast                      
    ...
    
    (6 rows)
    在前面的输出中,当分段值为 0 时,Amazon Redshift 会执行顺序扫描操作来扫描事件表。可以在标签列中找到顺序扫描操作。
  2. 运行下面的查询,获取分段值为 1 的查询的 svl_query_report
    select query,slice,segment,step,start_time,end_time,elapsed_time,rows,bytes,label from svl_query_report where query = 938787 and segment = 1 order by segment, step, elapsed_time, rows;
    下面是输出示例:
    query  | slice | segment | step |       start_time          |          end_time          | elapsed_time | rows | bytes  |     label           
    -------+-------+---------+------+---------------------------+----------------------------+--------------+------+--------+-------------------------------------------
    938787 |     1 |       1 |    0 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.830037 |         3173 |    0 |      0 | scan   tbl=376297 name=Internal Worktable  
    938787 |     0 |       1 |    0 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.831142 |         4278 | 8798 | 253580 | scan   tbl=376297 name=Internal Worktable
    938787 |     1 |       1 |    1 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.830037 |         3173 |    0 |      0 | project                                   
    938787 |     0 |       1 |    1 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.831142 |         4278 | 8798 |      0 | project                                   
    938787 |     1 |       1 |    2 | 2020-05-22 11:11:48.826864 | 2020-05-22 11:11:48.830037 |         3173 |    0 |      0 | hash   tbl=439                            
    ...                        
    
    (6 rows)
    查询继续运行,直到分段值为 1。对联接中的内部表执行哈希表操作。
  3. 运行下面的查询,获取分段值为 2 的查询的 svl_query_report
    select query,slice,segment,step,start_time,end_time,elapsed_time,rows,bytes,label from svl_query_report where query = 938787 and segment = 2 order by segment, step, elapsed_time, rows;
    
    EXPLAIN <query>;
    ->  XN Hash Join DS_BCAST_INNER  (cost=109.98..2815367496.05 rows=178125 width=27)
                               Hash Cond: ("outer".eventid = "inner".eventid)
       ->  XN Seq Scan on sales  (cost=0.00..1724.56 rows=172456 width=14)
    下面是输出示例:
    query  | slice | segment | step |         start_time         |          end_time          | elapsed_time | rows  |  bytes  |            label             
    --------+-------+---------+------+----------------------------+----------------------------+--------------+-------+---------+------------------------------
    938787 |     1 |       2 |    0 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 | 86519 | 1730380 | scan   tbl=278792 name=sales  
    938787 |     0 |       2 |    0 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 | 85937 | 1718740 | scan   tbl=278792 name=sales  
    938787 |     1 |       2 |    1 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 | 86519 |       0 | project                       
    938787 |     0 |       2 |    1 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 | 85937 |       0 | project                       
    938787 |     1 |       2 |    2 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 | 86519 |       0 | project                       
    938787 |     0 |       2 |    2 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 | 85937 |       0 | project                       
    938787 |     1 |       2 |    3 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 | 86519 |       0 | hjoin  tbl=439                
    938787 |     0 |       2 |    3 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 | 85937 |       0 | hjoin  tbl=439                
    938787 |     1 |       2 |    4 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 | 86519 |       0 | project                       
    938787 |     0 |       2 |    4 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 | 85937 |       0 | project                       
    938787 |     1 |       2 |    5 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 | 86519 |       0 | project                       
    938787 |     0 |       2 |    5 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 | 85937 |       0 | project                       
    938787 |     1 |       2 |    6 | 2020-05-22 11:11:48.839297 | 2020-05-22 11:11:48.865857 |        26560 |   576 |   34916 | aggr   tbl=448                
    938787 |     0 |       2 |    6 | 2020-05-22 11:11:48.838371 | 2020-05-22 11:11:48.865857 |        27486 |   576 |   34916 | aggr   tbl=448                
    ...                        
    
    (16 rows)
    在前面的示例中,当分段值为 2 时运行查询,并执行顺序扫描操作来扫描 sales 表。在同一分段中,执行聚合操作来聚合结果,然后执行哈希联接操作来联接表。其中一个表的联接列不是分配键或者排序键。结果是,内部表以 DS_BCAST_INNER 形式分配到所有计算节点。之后,可以在执行计划中看到内部表。您也可以运行此查询,以获取分段值为 345 的查询的 SVL_QUERY_REPORT

在这些分段中,执行哈希聚合操作和排序操作,并根据标签“aggr”和“sort”进行识别。哈希聚合操作是对未排序的分组聚合函数执行的。执行排序操作,以评估 ORDER BY 子句。

在使用完所有分段之后,查询会对分段 45 运行网络操作,以将中间结果发送至领导节点。结果将发送至领导节点进行进一步处理。您可以使用“return”标签查看结果。

查询完成后,运行以下查询,以检查查询的执行时间(以毫秒为单位):

select datediff (ms, exec_start_time, exec_end_time) from stl_wlm_query where query= 938787;

date_diff
-----------
101
(1 row)

优化查询

分析查询计划时,可以根据自己的用例调整查询性能。有关更多信息,请参阅 Top 10 performance tuning techniques for Amazon Redshift

相关信息

将查询计划映射到查询摘要

检查查询计划步骤

使用 SVL_QUERY_REPORT 视图

AWS 官方
AWS 官方已更新 2 个月前