By using AWS re:Post, you agree to the Terms of Use

use Redshift Spectrum to query both Redshift table and s3 file

0

A customer receives external data (one file about 100Mb/day) in s3. They need to generate a report with data from both Redshift table and s3 file. They are asking is it possible to query Redshift table and s3 at the same time via Redshift Spectrum without loading s3 file to Redshift. If yes, is it a best practice to do that? What's pros and cons?

asked 2 years ago36 views
1 Answers
0
Accepted Answer

Hi,

It would be possible to query and let's say join data from the redshift cluster and S3. Redshift Spectrum tables allow you to query the data in S3. Querying using Redshift Spectrum.

Pros

  • Querying the data in place can be cost saving. The more infrequently accessed and larger is your data set is in S3 the more cost efficient is the choice to use Spectrum.
  • Flexibility. Querying the data in place, also means that data in S3 can easily accessible for other applications such as ML or Big Data processing with EMR without the need to integrate with a DW, leaving DW to do what is supposed to do aka reports.

Cons

  • Less predictable costs because Spectrum queries cost over and above the Redshift cluster - currently at $5 per TB scanned
  • Spectrum might be slower than other solutions e.g. Athena or Redshift

Hope that helps!

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions