1 Answer
- Newest
- Most votes
- Most comments
1
There is no right number since it depends on the data, the rules and how long you are willing to wait. Sometimes, more workers won't even help if there is a bottleneck.
In general, try to avoid custom SQL rules since they have to run independently and go through the data on a separate read.
If you apply the same rules on a Glue Job, you could use SparkUI to view the execution and maybe find where is the bottleneck.
Relevant content
- asked 2 years ago
- AWS OFFICIALUpdated a year ago

Thank you, this helps already. Unfortunately, we need a bunch of
CustomSqlrules because we have many columns that can beNULLor must have a specific format. Would you say as a rule of thumb, we should have one worker for eachCustomSqlrule and some extra for the other rules?The general idea is that column rules are cheaper that table rules, try to do that format check with a column rule. No you cannot directly relate number rules and workers, the volume of data has a bigger impact so it's not a linear correlation.