use Redshift Spectrum to query both Redshift table and s3 file

0

A customer receives external data (one file about 100Mb/day) in s3. They need to generate a report with data from both Redshift table and s3 file. They are asking is it possible to query Redshift table and s3 at the same time via Redshift Spectrum without loading s3 file to Redshift. If yes, is it a best practice to do that? What's pros and cons?

질문됨 4년 전489회 조회
1개 답변
0
수락된 답변

Hi,

It would be possible to query and let's say join data from the redshift cluster and S3. Redshift Spectrum tables allow you to query the data in S3. Querying using Redshift Spectrum.

Pros

  • Querying the data in place can be cost saving. The more infrequently accessed and larger is your data set is in S3 the more cost efficient is the choice to use Spectrum.
  • Flexibility. Querying the data in place, also means that data in S3 can easily accessible for other applications such as ML or Big Data processing with EMR without the need to integrate with a DW, leaving DW to do what is supposed to do aka reports.

Cons

  • Less predictable costs because Spectrum queries cost over and above the Redshift cluster - currently at $5 per TB scanned
  • Spectrum might be slower than other solutions e.g. Athena or Redshift

Hope that helps!

AWS
Manos_S
답변함 4년 전
profile picture
전문가
검토됨 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인