- Newest
- Most votes
- Most comments
Your approach is not uncommon, and I have seen many organization first leveraging Athena to meet their query response times. Having an S3 data lake future proofs your architecture and allows you the flexibility to switch compute in the future. Redshift can be leveraged once you encounter that Athena is unable to provide you the performance you need. With Redshift Serverless you can get more powerful compute for those queries that need better performance than what Athena can offer. You pay for use and if queries execute only 3 hours a day then that is your compute cost with Redshift Serverless as there are no charges for idle times. Also, Redshift Serverless is fully integrated with S3 data lake and you can query data in-place without needing to copy data as local Redshift tables. However, I have also seen customers will create aggregated, and pre-joined data sets as Redshift local tables to meet tighter query SLA's.
Hello
There are cases where S3 + Athena won't be able to reach the same performance figures as Redshift or RDS, especially for very complex queries. However if your data and the queries you run perform well on Athena, there is certainly possibility to have significant savings. Better is to run a small PoC to see which one of the possible solutions corresponds better to your case.
As for ETL, if your transformations are simple and short enough, then Lambda will probably fit there very well, and with Step Functions orchestration the jobs/workflows will also be easy to maintain. Glue is more powerful if you need that and won't have limitations of the Lambda.
Relevant content
- asked a year ago
- asked 2 years ago
- AWS OFFICIALUpdated 10 months ago
