Hi,
When issuing queries on a large table Redshift risks running out of disk space.
What are best practices to limit queries to a reasonable disk usage?
This is vital to daily operation and an essential security feature if Redshift is used in conjunction with external services.
After running out of disk space (160GB total, 80GB used) I vacuumed the table and reduced the size from around 80 GB to 1 GB, but I still experience significant spikes in disk usage with simple queries.
Obviously there has to be a way to prevent users from killing the database by issuing a few select queries and I just don't know about it, so I would greatly appreciate your advice.
This example query uses up to 10 GB of disk space for more than a minute:
explain select * from my_schema.mytable order by created_time limit 1;
------------------------------------------------------------------------------------------------------------------------------------------------
XN Limit (cost=1000005105905.92..1000005105905.92 rows=1 width=4527)
-> XN Merge (cost=1000005105905.92..1000005199888.98 rows=37593224 width=4527)
Merge Key: created_time
-> XN Network (cost=1000005105905.92..1000005199888.98 rows=37593224 width=4527)
Send to leader
-> XN Sort (cost=1000005105905.92..1000005199888.98 rows=37593224 width=4527)
Sort Key: created_time
-> XN Seq Scan on event (cost=0.00..375932.24 rows=37593224 width=4527)
------------------------------------------------------------------------------------------------------------------------------------------------
[EDIT]
I limited the query via WLM to only be able to spill 1G to disk, but it does not abort the query even though it takes up way more disk space. The configuration works as expected otherwise - when I limit the time the query can take it aborts as expected. My guess is that it does not consider the way the query takes up disk space as spilling memory to disk. Please confirm if that is correct.
Cheers,
Johannes M.
Oh sorry, maybe I accidentally wrote that it is a CPU spike somwhere. The spike is in the disk size not CPU :) I am quite happy with the current compression rate as it is, but thanks for your recommendation! I will be sure to keep it in mind.