How to retrieve partition columns from Glue Catalog table using pyspark

0

Hi Folks,

I have a table called demo and it is cataloged in Glue. The table has three partition columns (col_year, col_month and col_day). I want to get the name of the partition columns programmatically using pyspark. The output should be below with the partition values (just the partition keys)

col_year, col_month, col_day

Could you please help me in getting the desired output? Thank you

Regards, AP

1 回答
0

Hello,

Generally, the last columns are the partitions.So, you can try get schema and then convert to list and read last n columns which are partitions.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-schema

As a solution other than in PySpark you can do get table API call and get the partitions and use them in code.

https://awscli.amazonaws.com/v2/documentation/api/latest/reference/glue/get-table.html#output

profile pictureAWS
支持工程师
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则