use Redshift Spectrum to query both Redshift table and s3 file

0

A customer receives external data (one file about 100Mb/day) in s3. They need to generate a report with data from both Redshift table and s3 file. They are asking is it possible to query Redshift table and s3 at the same time via Redshift Spectrum without loading s3 file to Redshift. If yes, is it a best practice to do that? What's pros and cons?

已提问 4 年前489 查看次数
1 回答
0
已接受的回答

Hi,

It would be possible to query and let's say join data from the redshift cluster and S3. Redshift Spectrum tables allow you to query the data in S3. Querying using Redshift Spectrum.

Pros

  • Querying the data in place can be cost saving. The more infrequently accessed and larger is your data set is in S3 the more cost efficient is the choice to use Spectrum.
  • Flexibility. Querying the data in place, also means that data in S3 can easily accessible for other applications such as ML or Big Data processing with EMR without the need to integrate with a DW, leaving DW to do what is supposed to do aka reports.

Cons

  • Less predictable costs because Spectrum queries cost over and above the Redshift cluster - currently at $5 per TB scanned
  • Spectrum might be slower than other solutions e.g. Athena or Redshift

Hope that helps!

AWS
Manos_S
已回答 4 年前
profile picture
专家
已审核 1 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则