use Redshift Spectrum to query both Redshift table and s3 file

0

A customer receives external data (one file about 100Mb/day) in s3. They need to generate a report with data from both Redshift table and s3 file. They are asking is it possible to query Redshift table and s3 at the same time via Redshift Spectrum without loading s3 file to Redshift. If yes, is it a best practice to do that? What's pros and cons?

feita há 4 anos489 visualizações
1 Resposta
0
Resposta aceita

Hi,

It would be possible to query and let's say join data from the redshift cluster and S3. Redshift Spectrum tables allow you to query the data in S3. Querying using Redshift Spectrum.

Pros

  • Querying the data in place can be cost saving. The more infrequently accessed and larger is your data set is in S3 the more cost efficient is the choice to use Spectrum.
  • Flexibility. Querying the data in place, also means that data in S3 can easily accessible for other applications such as ML or Big Data processing with EMR without the need to integrate with a DW, leaving DW to do what is supposed to do aka reports.

Cons

  • Less predictable costs because Spectrum queries cost over and above the Redshift cluster - currently at $5 per TB scanned
  • Spectrum might be slower than other solutions e.g. Athena or Redshift

Hope that helps!

AWS
Manos_S
respondido há 4 anos
profile picture
ESPECIALISTA
avaliado há um mês

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas