AWS Glue data quality dynamic rules not working

0

I tried AWS Glue data quality dynamic rules in my AWS Glue pipeline. I wrote below rule RowCount > avg(last(3))

Then I processed 3 csv files with 1000,10000 and 100 rows. Then in 4th run I again processed file with 100 rows, expecting this rule to fail. But it didnt fail. I got below pass results :

RowCount > avg(last(3)) Rule passed Dataset.*.RowCount: 100.00

  • Yep, it sounds it thinks the average is 100, not sure why. What did it say on the previous evaluations?
    How did you make it process different files on different runs?, do you have any count that the job actually read them all? (maybe it has thrown away invalid rows)

  • This worked on 5th time. I think the issue is, in my first 3 runs this rule was not there. I added this rule in 4th run only. So it didnt took earlier runs into consideration. But this thing is not mentioned anywhere in documentation. There is no mention how Glue stores 'state' of runs.

已提问 3 个月前171 查看次数
1 回答
1

That's a fair point about clarifying the documentation, the history is based on the historical metric, if the rule wasn't applied then the metric wasn't created, the rule cannot backprocess the data to calculate the previous history if the rule wasn't applied.

profile pictureAWS
专家
已回答 2 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则