- Newest
- Most votes
- Most comments
Redshift Auto workload management (AutoWLM) allocates memory for each query based on the column sizes. If the actual data is much lesser than column sizes then memory is over-allocated and this impacts your throughput as lesser queries will be executed concurrently.
You might see overallocation of memory in workmem
column of SVL_QUERY_REPORT for the scan step, but it is not an easy or direct correlation which can confirm the column sizing issue you are facing. You should start with the largest width column and work your way down the list by comparing the actual data contained in that column.
Thank You So much Milind for the clarification. So if i get it correct , it means if we allocate unnecessary more data length to the attributes then the number of queries which needs to be executed using those tables will be lesser at any point in time. But individual query performance should not degrade.
But the odd thing which our team is stating is that , e.g. if we have table with column having actual data length of 50 bytes. And we run a query with the defined data length(say same 50 bytes) VS the same table structure with bigger defined data length(500 bytes) then the performance will be degraded by 75% in case of bigger defined data length table access , which doesn't sounds to be correct.
I want to test performance of sample read/write queries in above scenario with some large volume of sample data(say 50-100 million rows) to prove the theory. So is there any quick way of creating such bigger data volume tables in redshift without using copy from S3. Something similar to procedural code using INSERT statement i.e bulk insert within a loop. Is it possible?
Relevant content
- asked 4 years ago
- Accepted Answerasked 2 years ago
- asked 5 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated a year ago
Instead of creating data from scratch you could use TPC-DS Benchmark Data https://aws.amazon.com/marketplace/pp/prodview-iopazp7irqk6s, free data set provided by AWS. Refer https://aws.amazon.com/blogs/big-data/run-a-popular-benchmark-on-amazon-redshift-serverless-easily-with-aws-data-exchange/ for how it has been used. You will not need to follow all steps of that blog but just use tpcdc1 through tpcdc10000 schema as you see fit for you own benchmarking test. Also if this helps close out your original question then please mark answer as accepted so community can leverage.