1 Answer
- Newest
- Most votes
- Most comments
0
There is no built in capability in Amazon Data Zone specifically for performing quantile statistics on data assets. Amazon Data Zone primarily focuses on providing a secure and governed data catalog for managing and sharing data within orgs. However you can leverage other AWS services or external tools to perform quantile statistics on your data assets. Here are few options:
- AWS Glue: It is an ETL service that can be used to transform and analyse your data. You can use Glue to extract data from various sources, perform transformations, and load the results into a data warehouse or data lake. Once the data is in a suitable format you can use SQL queries or statistical functions to calculate metrics like min, max, median, standard deviation and quantities.
- Amazon Athena: Amazon Athena is a serverless interactive query service that allows you to analyze data stored in Amazon S3 using standard SQL. You can create queries against your dataset and use SQL functions like MIN, MAX, MEDIAN, STDDEV, and PERCENTILE_CONT to calculate quantile statistics.
- External Tools: You can use external data analysis tools like Python's pandas library, R, or Apache Spark to load and analyze your dataset. These tools provide extensive statistical functions, including quantile calculations, that can be applied to your data.
To perform quantile statistics using one of these approaches, you would typically load your dataset into a suitable storage or compute service (e.g., Amazon S3, Amazon Redshift, or an Amazon EMR cluster) and then apply the appropriate statistical functions or queries to calculate the desired metrics.
Please check if my answer to your query helps.
answered 3 years ago
Relevant content
- asked 2 years ago
- asked 2 years ago
- asked 3 years ago
- AWS OFFICIALUpdated 2 years ago

Thanks for the response.
We already have a catalog solution wherein we use pyspark to perform these statistics and made it available in catalog(via quicksight). We are exploring now datazone option.
If we have these statistics generated is there a solution to get it imported in the catalog in datazone? Or Are there any future plans on it?