- Newest
- Most votes
- Most comments
So the answer is that each column value that's NULL
takes up 1 bit independently of the data type, and doesn't take up the size of the data type.
Additionally doing something like NOT NULL DEFAULT 0
to completely remove NULLs will be significantly worse on performance because it causes 0 to both take the full size of the type, and will cause the block min value to be 0 which causes RedShift to have to read the entire block when something something like SELECT COUNT(*) WHERE lots_of_zero > 0
instead of simply taking the block count.
The size and query difference can be significant under every type of ENCODE with a NULL heavy column being > 2x smaller over DEFAULT 0 and scan 2-10x less rows depending on the query.
As there is nothing about it in Redshift's documentation, I would say that it behaves just like the database it was built over: PostgreSQL.
Check this SO question about how null is stored in PostgreSQL.
You could check if creating 9 columns will indeed take more space, to confirm that Redshift indeed works like Postgre in this case.
Hope this clarify things a bit. If you cross anything different please share here, I would love to know!
Relevant content
- asked 2 years ago
- asked a year ago
- asked 10 months ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 7 months ago
- AWS OFFICIALUpdated 3 months ago
Looked into it, they take up 1 bit per value and interestingly swapping 0 for NULL will let you set your block ranges to min 1 which can considerably impact query perf since column level blocks can be skipped and the block metadata can be directly used.