- Newest
- Most votes
- Most comments
AWS Glue main focus is the kind of use case you describe and much larger datasets.
Obviously, depending on the complexity of your joins and transformation logic, you can run into challenges if you don't have previous experience using Apache Spark (which Glue ETL is based on). It's probably worth investing some time understanding how it works and how to monitor it.
The cost effectiveness depends on how efficient is your logic is and how you tune your configuration. Glue 4.0 provides a number of improvements and optimizations out of the box, that should really help you with that.
Crawlers are an optional convenience, you could read the csv files directly if you only need to read them once (if is not a table you to use for other purposes).
Step Functions require a bit learning but allow building advanced workflows, for simple workflows Glue provides triggers and visual workflows inside Glue.
Relevant content
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 5 months ago