use Redshift Spectrum to query both Redshift table and s3 file

0

A customer receives external data (one file about 100Mb/day) in s3. They need to generate a report with data from both Redshift table and s3 file. They are asking is it possible to query Redshift table and s3 at the same time via Redshift Spectrum without loading s3 file to Redshift. If yes, is it a best practice to do that? What's pros and cons?

posta 4 anni fa489 visualizzazioni
1 Risposta
0
Risposta accettata

Hi,

It would be possible to query and let's say join data from the redshift cluster and S3. Redshift Spectrum tables allow you to query the data in S3. Querying using Redshift Spectrum.

Pros

  • Querying the data in place can be cost saving. The more infrequently accessed and larger is your data set is in S3 the more cost efficient is the choice to use Spectrum.
  • Flexibility. Querying the data in place, also means that data in S3 can easily accessible for other applications such as ML or Big Data processing with EMR without the need to integrate with a DW, leaving DW to do what is supposed to do aka reports.

Cons

  • Less predictable costs because Spectrum queries cost over and above the Redshift cluster - currently at $5 per TB scanned
  • Spectrum might be slower than other solutions e.g. Athena or Redshift

Hope that helps!

AWS
Manos_S
con risposta 4 anni fa
profile picture
ESPERTO
verificato un mese fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande