what is the best Job scheduler in AWS
1)I have 150 ETL jobs need to be moved from on-premise to AWS cloud. What is the best way to schedule them in AWS? AWS batch or SWF or STEP function or any other service? Which tool/service has all the capability of a full time job scheduler?
2) As per AWS Batch documentation, it says that it supports all jobs that are Docker container. Does it mean that AWS batch is only meant for Docker container jobs? In my case all jobs are either bash/ksh scripts and normal ETL jobs that read from files from EC2 and load data into RDS DB.
Hi there,
For the first question, I would suggest considering using ETL glue for your 150 ETL jobs. It's the serverless ETL offering within AWS data tools and you basically pay for the amount of compute (per second billing) to run your scripts (i.e like Lambda). Scalability and maintenance wise, most of the heavy lifting is handled for you.
Glue supports a few languages . To run your ETL script using python within glue, you can use a module called subprocess to run your bash script within the lean python script.
import subprocess
exit_code = subprocess.call('./practice.sh')
print(exit_code)
2)For your second question, the quick answer is yes as AWS batch is designed to run batch workloads using containers. However, you can have a simple fetch and run container image to do the work for all your scripts. You start by building a simple Docker image containing a helper application that can download your script or even a zip file from Amazon S3. AWS Batch then launches an instance of your container image to retrieve your script and run your job.
Here is a technical guide to do that .
Hope the above helps. Cheers.
Hi, Thanks for providing the details. But I am looking for a full-fledged scheduler that provides a user interface , has all options/features of a scheduler, easy to schedule the jobs (adding dependencies, graphical representation of scheduling jobs etc). May I know if AWS batch has these features? How about the MWAA scheduler? Can these tools replace a scheduler like (autosys/Control-M etc)? Please note that I am not using AWS components for my job. instead I have installed my application software in EC2 instances and need to run the jobs using scripts. There are very complex requirements/dependencies in our jobs schedule. So I am looking for the best scheduler in AWS.
What about MWAA? It's a little bit complex but more suitable as a full time ETL scheduler.
Relevant questions
Which role do I have to use for the Fargate tasks on AWS Batch?
Accepted Answerasked 5 months agoOptimize Batch startup time
asked 4 months agotrigger glue job from s3
Accepted Answerasked a month agoWhat is a complete JOB scheduler in AWS
asked 3 months agoBatch and Spot Interruptions
Accepted Answerasked 4 years agoCan I force each job to run on a dedicated instance?
Accepted Answerasked 3 years agowhat is the best Job scheduler in AWS
Accepted Answerasked 4 months agoETL Workflow Orchestration Step functions and/or Glue Workflows??
Accepted Answerasked 3 years agoGlue Jobs & Multiple tables
Accepted Answerasked 4 years agoIs it optimal to keep one lengthy Glue job script, or split it into sub-modules/multiple files?
Accepted Answerasked 4 years ago
I am migrating SAS from on-premise DC to AWS. So I need to schedule the existing jobs in AWS. I am not using any AWS components to design my ETL jobs. I believe in this case, I can't use Glue, rather I am checking what is the best option to schedule existing SAS jobs (which will run in SAS DI EC2 instance in AWS). Please suggest AWS GLUE/MWAA/SWF or any other scheduler will be best for my case.