How to pick distribution keys and sort keys for Amazon Redshift?

3 minute read
Content level: Expert
1

Amazon Redshift ATO automatically does a lot of data storage optimization for your warehouse uses advanced artificial intelligence methods, but you can follow these simple rules to hit the ground running for your tables to get the best query performance from day one.

Amazon Redshift's Automatic table optimization (ATO) is a self-tuning capability that automatically optimizes the design of tables by applying sort and distribution keys intelligently. Furthermore, ATO continuously observes how queries interact with your tables, and can change sort and distribution keys to optimize performance for the cluster's workload.

Also with Amazon Redshift you can specify the Primary Key (PK) and Foreign Key (FK) to signify the relationships between your tables. These relationships often describe how the tables will be joined to each other, and Amazon Redshift leverages that to pick appropriate distribution keys.

Specify PK and FK for Redshift tables

ATO will pick Distribution Style (DS) and Sort Key (SK) automatically for you.

If those do not work then proceed with below, in order

1. Table mostly used in join on single column, and no filters applied

DS:Key on the join column

SK on the join column

2. Table mostly used in join on multi-column, and no filters applied

DS:Key on the highest cardinality join column

SK on highest cardinality join column

3. Table mostly used in join on single column, and column filters applied

DS:Key on the join column

SK on the most frequent column eliminating most rows

4. Table mostly used in join on multi-column, and column filters applied

DS:Key on the highest cardinality join column

SK on the most frequent column eliminating most rows

5. Relatively LARGE table, mostly NOT used in joins, and column filters applied

DS:Even

SK on the most frequent column eliminating most rows

6. Relatively LARGE table, mostly NOT used in joins, and no filters applied

DS:Even

no selection/AUTO

7. Relatively SMALL table, mostly used with column filters applied

DS:All

SK on the most frequent column eliminating most rows

8. Relatively SMALL table, mostly used with no filters applied

DS:All

no selection/AUTO

IMPORTANT

After manually selecting the DK and SK, you should then alter the tables distribution style to auto and sort key auto. This will make the table ATO eligible again and the ATO will use your current DK/SK selection as baseline.

You can apply these table rules to Materialized Views too!

A table with fewer than 5 Million rows is considered a small table, however if a 100 Million rows table is frequently joined with a 10 Billion rows table then it is considered a relatively small table.

profile pictureAWS
EXPERT
published 9 days ago827 views