How to retrieve partition columns from Glue Catalog table using pyspark

0

Hi Folks,

I have a table called demo and it is cataloged in Glue. The table has three partition columns (col_year, col_month and col_day). I want to get the name of the partition columns programmatically using pyspark. The output should be below with the partition values (just the partition keys)

col_year, col_month, col_day

Could you please help me in getting the desired output? Thank you

Regards, AP

1 Antwort
0

Hello,

Generally, the last columns are the partitions.So, you can try get schema and then convert to list and read last n columns which are partitions.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-schema

As a solution other than in PySpark you can do get table API call and get the partitions and use them in code.

https://awscli.amazonaws.com/v2/documentation/api/latest/reference/glue/get-table.html#output

profile pictureAWS
SUPPORT-TECHNIKER
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen