Data Quality using PyDeequ
0
Hi, Does anyone use PyDeequ for large enterprises. I am exploring this library and have the below questions:
- Looking at the github repo it doesnt seem like it is actively udated. ALso, it supoorts Spark 3.0.0 but not later versions.
- Some of the apis didnt work(for complex examples). I dont know if there is any Amazon support.
- Also the scala version(deequ) is more up to date than the python version(PuDeequ). s is there a plan to sunset the PyDeequ version
- Should I use this for large enterprise data validation framework or there are any other alternate tools. Kindly advise.
Thank you!
asked 13 days ago12 views
1 Answers
0
Hi
To answer question '4' - I would recommend you take a look at AWS Glue DataBrew. Not only is it a fully managed service, but you'll also find that it has a better velocity of new features & updates as its supported by the AWS Glue team.
Thanks
Nick
answered 13 days ago
Relevant questions
Is it possible to use CloudFront Functions with CloudFormation, and have the source code in an external file?
asked a month agoHow do I upload a checkpoint file to my Github repo
asked 2 years agoData Quality Framework in AWS
asked 2 months agoHow To Get Bad Records Using AWS Pydeequ - Data Quality Checks
asked 12 days agoDoes the 1.7 Setup Assistant doesn't seem to be working properly... Is it working for anyone else?
Accepted Answerasked 5 years agoGroundTruth text labelling - hide data columns, and methods of quality control
asked 6 months agoData Quality using PyDeequ
asked 13 days agoI am trying to write an ETL job to the Data Catalog but its writing the Headers as Data
Accepted Answerasked 3 months agoSmall Scale VOD Streaming - Am I looking in the right place?
asked 3 months agoI need to read S3 data, transform and put into Data Catalog. Should I be using a Crawler?
Accepted Answerasked 3 months ago