Why does Amazon Redshift run the COPY ANALYZE process when STATUPDATE is turned off?

3 minute read
0

I tried to use the COPY command to load data into a new table in Amazon Redshift. I even added the COMPUPDATE OFF parameter. Why does COPY ANALYZE still run?

Short description

Before you create a table in Amazon Redshift, select the distribution style of your table. If a table doesn't specify a distribution style, it defaults to AUTO distribution. Amazon Redshift's AUTO distribution feature then uses the COPY ANALYZE process to determine whether the table should use EVEN or ALL distribution.

Note: The COMPUPDATE OFF parameter still runs COPY ANALYZE on tables using an AUTO distribution style. The STATUPDATE OFF parameter does not disable or repress the COPY ANALYZE command.

Resolution

1.    Verify the current distribution style of a table by running a SHOW TABLE:

SHOW TABLE [schema_name.]table_name

Note: By default, Amazon Redshift applies AUTO distribution if a distribution style isn't specified in the CREATE TABLE statement.

2.    If AUTO distribution is turned on, then create a table with a defined distribution style. Specifying a distribution style prevents COPY ANALYZE from running.

The following example specifies an EVEN distribution style:

create table rs_example_test (id int)diststyle even;

For more information about distribution styles, see Working with data distribution styles.

3.    Use the COPY command to load the data into your table, and then append it with the STATUPDATE OFF parameters.

Note: It's a best practice to use the COPY command to load large amounts of data, instead of using individual INSERT statements. This minimizes the risk of performance issues.

4.    Run a query on the SVL_STATEMENTTEXT system table, and then filter by the process ID (PID) related to your session:

select userid, xid, pid, trim(label) as label, starttime, endtime, sequence, trim(type) as type, left(text,60) text from svl_statementtext where pid = 26815 order by xid, starttime, sequence;

The syntax above checks whether COPY ANALYZE ran as part of the COPY transaction.

The output will look similar to the following:

userid      xid            pid    label      starttime            endtime               sequence     type      text
100       78915    26815   default      2019-03-19  14:06:19    2019-03-19 14:06:21   0             QUERY      copy rs_test3 from 's3://xxxxx-rs-log/AWSLogs/rs_test2.csv'\
100       78915    26815   default      2019-03-19  14:06:21    2019-03-19 14:06:21   0             UTILITY  COMMIT

This confirms that COPY ANALYZE no longer runs as part of the COPY transaction.


Related information

Use a staging table to perform a merge (upsert)

AWS OFFICIAL
AWS OFFICIALUpdated a year ago