AWS Glue data quality dynamic rules not working

0

I tried AWS Glue data quality dynamic rules in my AWS Glue pipeline. I wrote below rule RowCount > avg(last(3))

Then I processed 3 csv files with 1000,10000 and 100 rows. Then in 4th run I again processed file with 100 rows, expecting this rule to fail. But it didnt fail. I got below pass results :

RowCount > avg(last(3)) Rule passed Dataset.*.RowCount: 100.00

  • Yep, it sounds it thinks the average is 100, not sure why. What did it say on the previous evaluations?
    How did you make it process different files on different runs?, do you have any count that the job actually read them all? (maybe it has thrown away invalid rows)

  • This worked on 5th time. I think the issue is, in my first 3 runs this rule was not there. I added this rule in 4th run only. So it didnt took earlier runs into consideration. But this thing is not mentioned anywhere in documentation. There is no mention how Glue stores 'state' of runs.

質問済み 3ヶ月前171ビュー
1回答
1

That's a fair point about clarifying the documentation, the history is based on the historical metric, if the rule wasn't applied then the metric wasn't created, the rule cannot backprocess the data to calculate the previous history if the rule wasn't applied.

profile pictureAWS
エキスパート
回答済み 3ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン