- Newest
- Most votes
- Most comments
Hi there,
ETL Job Automation Best Practices
-
Set up complex workflows with AWS Glue Workflow (multiple crawlers, jobs, and triggers) https://docs.aws.amazon.com/glue/latest/dg/workflows_overview.html
-
Start a workflow with Amazon EventBridge event to automate your AWS services and respond automatically to system events such as application availability issues or resource changes. https://docs.aws.amazon.com/glue/latest/dg/starting-workflow-eventbridge.html
-
Use job bookmarks to track data that has been processed by a previous run of an ETL job.
https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html
Developing & testing AWS Job Scripts Best Practices
-
If you prefer local/remote development experience, the Docker image is a good choice. This helps you to develop and test Glue job script anywhere you prefer without incurring AWS Glue cost. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html
-
If you prefer local development without Docker, installing the AWS Glue ETL library directory locally is a good choice. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html
-
Unit test and deploy AWS Glue jobs using AWS CodePipeline. https://aws.amazon.com/blogs/devops/how-to-unit-test-and-deploy-aws-glue-jobs-using-aws-codepipeline/
Customer Case Study
"By using AWS Glue, ShopFully can automate the ETL services that had taken up time on its previous solution and create individual clusters for each campaign, reducing latency and efficiently handling petabytes of data". Read more here: https://aws.amazon.com/solutions/case-studies/shopfully-case-study/
Relevant content
- asked a year ago
- Accepted Answerasked 3 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated 4 months ago