How to retrieve partition columns from Glue Catalog table using pyspark

0

Hi Folks,

I have a table called demo and it is cataloged in Glue. The table has three partition columns (col_year, col_month and col_day). I want to get the name of the partition columns programmatically using pyspark. The output should be below with the partition values (just the partition keys)

col_year, col_month, col_day

Could you please help me in getting the desired output? Thank you

Regards, AP

1개 답변
0

Hello,

Generally, the last columns are the partitions.So, you can try get schema and then convert to list and read last n columns which are partitions.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-schema

As a solution other than in PySpark you can do get table API call and get the partitions and use them in code.

https://awscli.amazonaws.com/v2/documentation/api/latest/reference/glue/get-table.html#output

profile pictureAWS
지원 엔지니어
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인