AWS Glue data quality dynamic rules not working

0

I tried AWS Glue data quality dynamic rules in my AWS Glue pipeline. I wrote below rule RowCount > avg(last(3))

Then I processed 3 csv files with 1000,10000 and 100 rows. Then in 4th run I again processed file with 100 rows, expecting this rule to fail. But it didnt fail. I got below pass results :

RowCount > avg(last(3)) Rule passed Dataset.*.RowCount: 100.00

  • Yep, it sounds it thinks the average is 100, not sure why. What did it say on the previous evaluations?
    How did you make it process different files on different runs?, do you have any count that the job actually read them all? (maybe it has thrown away invalid rows)

  • This worked on 5th time. I think the issue is, in my first 3 runs this rule was not there. I added this rule in 4th run only. So it didnt took earlier runs into consideration. But this thing is not mentioned anywhere in documentation. There is no mention how Glue stores 'state' of runs.

asked 2 months ago134 views
1 Answer
1

That's a fair point about clarifying the documentation, the history is based on the historical metric, if the rule wasn't applied then the metric wasn't created, the rule cannot backprocess the data to calculate the previous history if the rule wasn't applied.

profile pictureAWS
EXPERT
answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions