- 新しい順
- 投票が多い順
- コメントが多い順
More detailed info is in my blog: https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/. Spectrum layer will do what it needs to do (push-down operations). Hopefully it will "filter" out majority of the rows off S3 before sending a small portion back to the main Redshift cluster for further processing (such as joins or DISTINCT). No, a tiny 2 x dc2.large cluster would not be able to handle 1M of 1GB Parquet files in S3 and do joins on these large external tables. Each slice of the main Redshift cluster can invoke up to a max of 10 Spectrum nodes per query. Data post-Spectrum filtering will be sent to Redshift slices depending on the next step in the execution pipeline (as generated by Redshift Optimizer) and hashing values of the join/GBY columns, etc. This is not much different from performing joins between a regular Redshift table that is using DISTSTYLE EVEN and another Redshift table that uses DISTKEY distribution.
関連するコンテンツ
- AWS公式更新しました 2年前
- AWS公式更新しました 1年前
- AWS公式更新しました 1年前