AWS Glue data quality dynamic rules not working

0

I tried AWS Glue data quality dynamic rules in my AWS Glue pipeline. I wrote below rule RowCount > avg(last(3))

Then I processed 3 csv files with 1000,10000 and 100 rows. Then in 4th run I again processed file with 100 rows, expecting this rule to fail. But it didnt fail. I got below pass results :

RowCount > avg(last(3)) Rule passed Dataset.*.RowCount: 100.00

  • Yep, it sounds it thinks the average is 100, not sure why. What did it say on the previous evaluations?
    How did you make it process different files on different runs?, do you have any count that the job actually read them all? (maybe it has thrown away invalid rows)

  • This worked on 5th time. I think the issue is, in my first 3 runs this rule was not there. I added this rule in 4th run only. So it didnt took earlier runs into consideration. But this thing is not mentioned anywhere in documentation. There is no mention how Glue stores 'state' of runs.

gefragt vor 3 Monaten172 Aufrufe
1 Antwort
1

That's a fair point about clarifying the documentation, the history is based on the historical metric, if the rule wasn't applied then the metric wasn't created, the rule cannot backprocess the data to calculate the previous history if the rule wasn't applied.

profile pictureAWS
EXPERTE
beantwortet vor 3 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen