The following query returns 'abcd' instead of an empty result set:
WITH
not_in_cte AS (
SELECT *
FROM big_table
WHERE id NOT IN (SELECT * FROM small_table)
),
left_join_cte AS (
SELECT 'abcd'::varchar(4) AS value
FROM not_in_cte
LEFT JOIN medium_table
ON not_in_cte.id = medium_table.id
)
SELECT *
FROM left_join_cte
WHERE value = 'wxyz'::varchar(4)
LIMIT 1;
Where the three tables (big, medium and small) are defined by:
CREATE TABLE big_table
AS
SELECT
ROW_NUMBER() OVER() AS id
FROM stl_scan
LIMIT 1000;
CREATE TABLE medium_table
AS
SELECT id
FROM big_table
LIMIT 100;
CREATE TABLE small_table
AS
SELECT id
FROM big_table
LIMIT 10;
The following changes "fix" the query so that it returns an empty result set:
- Removing the
WHERE
clause from the first CTE
- Removing the
LEFT JOIN
from the second CTE
I'm not sure about how much of the above is required to reproduce the issue. It's impacted our work a few times recently on seemingly very different queries. In each case, we've been able to distill the issue down to something that looks like the query shared above
Thanks Ziad! Sadly our AWS support plan doesn't include technical support, which is why I posted here. Hoping someone from the redshift dev team checks these periodically
Thank you Chris for the details. Yes this will be reviewed.