By using AWS re:Post, you agree to the Terms of Use

Can you use a Redshift Spectrum Table in a CTE?

0

Hi! I'm trying to make some data available via Spectrum to our reporting platform. I chose Spectrum because it offers lower latency to our data lake due to not needing a batched ETL process.

One of the queries I have looks like this

with txns as (select * from spectrum_table where ...)
select field1, field2, ...
from txns t1
left join txns t2 on t2.id = t1.id
left join txns t3 on t3.id = t1.id
where...

Intuitively, this should cache the Spectrum query in a temp table with the CTE, and make it available to query later in query without hitting S3 a second (or third) time.

However, I checked the explain plan, and with each join the number of "S3 Seq Scan"s goes up by one. So it appears to do the Spectrum scan each time the CTE is queried.

Questions:

  1. Is this actually happening? Or is the explain plan wrong?

  2. If it is happening, what other options are there to achieve this? Other than manually creating a temp table (this will be accessed by a reporting tool, so I'd prefer to avoid allowing explicit write access or requiring multiple statements)

Thanks!

asked 8 months ago60 views
1 Answer
0

Hi,

I think the doc here is a little misleading:

Each common table expression (CTE) defines a temporary table, which is similar to a view definition. You can reference these temporary tables in the FROM clause.

CTE are an efficient way of defining tables that can be used throughout the execution of a single query. In all cases, the same results can be achieved by using subqueries in the main body of the SELECT statement, but WITH clause subqueries may be simpler to write and read.

Where possible, WITH clause subqueries that are referenced multiple times are optimized as common sub-expressions; that is, it may be possible to evaluate a WITH subquery once and reuse its results.

And the optimization relies on the query optimizer task not the query execution itself.

You can alternatively use materialized views to cache frequently accessed data as described in the following blog:

And since it supports now external tables (https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-redshift-materialized-views-support-external-tables/).

Hope this helps

answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions