How to retrieve partition columns from Glue Catalog table using pyspark

0

Hi Folks,

I have a table called demo and it is cataloged in Glue. The table has three partition columns (col_year, col_month and col_day). I want to get the name of the partition columns programmatically using pyspark. The output should be below with the partition values (just the partition keys)

col_year, col_month, col_day

Could you please help me in getting the desired output? Thank you

Regards, AP

已提問 1 年前檢視次數 471 次
1 個回答
0

Hello,

Generally, the last columns are the partitions.So, you can try get schema and then convert to list and read last n columns which are partitions.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-schema

As a solution other than in PySpark you can do get table API call and get the partitions and use them in code.

https://awscli.amazonaws.com/v2/documentation/api/latest/reference/glue/get-table.html#output

profile pictureAWS
支援工程師
已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南