- Newest
- Most votes
- Most comments
Hi there,
For the first question, I would suggest considering using ETL glue for your 150 ETL jobs. It's the serverless ETL offering within AWS data tools and you basically pay for the amount of compute (per second billing) to run your scripts (i.e like Lambda). Scalability and maintenance wise, most of the heavy lifting is handled for you.
Glue supports a few languages . To run your ETL script using python within glue, you can use a module called subprocess to run your bash script within the lean python script.
import subprocess
exit_code = subprocess.call('./practice.sh')
print(exit_code)
2)For your second question, the quick answer is yes as AWS batch is designed to run batch workloads using containers. However, you can have a simple fetch and run container image to do the work for all your scripts. You start by building a simple Docker image containing a helper application that can download your script or even a zip file from Amazon S3. AWS Batch then launches an instance of your container image to retrieve your script and run your job.
Here is a technical guide to do that .
Hope the above helps. Cheers.
What about MWAA? It's a little bit complex but more suitable as a full time ETL scheduler.
Relevant content
- asked 2 years ago
- asked 2 years ago
- Accepted Answerasked 5 months ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 2 years ago
I am migrating SAS from on-premise DC to AWS. So I need to schedule the existing jobs in AWS. I am not using any AWS components to design my ETL jobs. I believe in this case, I can't use Glue, rather I am checking what is the best option to schedule existing SAS jobs (which will run in SAS DI EC2 instance in AWS). Please suggest AWS GLUE/MWAA/SWF or any other scheduler will be best for my case.
Hi, Thanks for providing the details. But I am looking for a full-fledged scheduler that provides a user interface , has all options/features of a scheduler, easy to schedule the jobs (adding dependencies, graphical representation of scheduling jobs etc). May I know if AWS batch has these features? How about the MWAA scheduler? Can these tools replace a scheduler like (autosys/Control-M etc)? Please note that I am not using AWS components for my job. instead I have installed my application software in EC2 instances and need to run the jobs using scripts. There are very complex requirements/dependencies in our jobs schedule. So I am looking for the best scheduler in AWS.