Why does a table in an Amazon Redshift cluster consume more or less disk storage space than expected?

4 minute read
0

A table is occupying more disk space than expected or a percentage of free disk space did not increase after my classic resize. How does Amazon Redshift calculate disk storage and table size?

Resolution

Checking the minimum table size

The minimum table size is the smallest footprint that a table has on an Amazon Redshift cluster. You can check the minimal table size when analyzing the cluster storage use or when resizing an Amazon Redshift cluster.

For tables created using the KEY, EVEN, or Auto (EVEN) distribution style, use the following formula:

Minimum table size = block_size (1 MB) *
(number_of_user_columns + 3 system columns) * number_of_populated_slices * number_of_table_segments

For tables created using the ALL or Auto (ALL) distribution style, use the following formula:

Minimum table size = block_size (1 MB) *
(number_of_user_columns + 3 system columns) * number_of_cluster_nodes * number_of_table_segments

For the table formulas, the number of segments is based on whether the table has a defined sort key. If an Amazon Redshift table has a defined sort key, then the table has two segments: one sorted segment and one unsorted segment. If an Amazon Redshift table has no sort key, then the table produces only one unsorted segment.

You can calculate the number of populated slices using the following query:

select count(distinct a.slice) as
number_of_populated_slices, b."table" from stv_blocklist a,
svv_table_info b where a.tbl = b.table_id group by b."table" ;

Example: six dc2.large cluster

For example, a six dc2.large cluster can be created with four small tables of the same structure and number of rows. If the cluster uses three different distribution styles and one distribution style with a sort key, then different queries are used.

The following query creates a table with an ALL distribution style (and an output of six populated slices):

create table testsize_all (a varchar(100),b varchar(100), c
varchar(100)) diststyle all;
insert into testsize_all values
('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c');

The following query creates a table with an EVEN distribution style (and an output of six populated slices):

create table testsize_even (a varchar(100),b varchar(100), c
varchar(100)) diststyle even;
insert into testsize_even values
('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c');

The following query creates a table with a distribution key (and an output of one populated slice):

create table testsize_key (a varchar(100),b varchar(100), c
varchar(100)) distkey (a);
insert into testsize_key values
('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c');

The following query creates a table with a distribution key and sort key (and an output of six populated slices):

create table testsize_sort_even (a varchar(100),b
varchar(100), c varchar(100) ) diststyle even sortkey (a);
insert into testsize_sort_even values
('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c'),('a','b','c');

Calculating the minimum table size

To calculate the minimum table size for an EVEN distribution style, use the following formula:

Minimum table size = block_size (1 MB) *
(number_of_user_columns + 3 system columns) * number_of_populated_slices *
number_of_table_segments
1MB * (3+3) * 6 *1 = 36MB

To calculate the minimum table size for an ALL distribution style, use the following formula:

Minimum table size = block_size (1 MB) *
(number_of_user_columns + 3 system columns) * number_of_cluster_nodes *
number_of_table_segments
1MB * (3+3) * 6 *1 = 36 MB

To calculate the minimum table size for a KEY distribution style, use the following formula:

Minimum table size = block_size (1 MB) * (number_of_user_columns
+ 3 system columns) * number_of_populated_slices * number_of_table_segments
1MB * (3+3) * 1 *1 = 6MB

To calculate the minimum table size for an even distribution with a sort key, use the following formula:

Minimum table size = block_size (1 MB) *
(number_of_user_columns + 3 system columns) * number_of_populated_slices *
number_of_table_segments
1MB * (3+3) * 6 *2 = 72MB

As the calculations indicate, with a small number of rows inserted, the size of your table is larger than expected. The size of the table continues to grow as the number of rows are inserted and the number of populated slices grow.

When a classic resize is performed, the number of populated slices grows without a growth in data volume for the table. As a result, the amount of free space after the resize doesn't grow linearly.


Related information

Amazon Redshift best practices for designing tables

Working with automatic table optimization

Columnar storage

Choose the best sort key

Query planning and execution workflow

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago