How to retrieve partition columns from Glue Catalog table using pyspark

0

Hi Folks,

I have a table called demo and it is cataloged in Glue. The table has three partition columns (col_year, col_month and col_day). I want to get the name of the partition columns programmatically using pyspark. The output should be below with the partition values (just the partition keys)

col_year, col_month, col_day

Could you please help me in getting the desired output? Thank you

Regards, AP

1 Respuesta
0

Hello,

Generally, the last columns are the partitions.So, you can try get schema and then convert to list and read last n columns which are partitions.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-schema

As a solution other than in PySpark you can do get table API call and get the partitions and use them in code.

https://awscli.amazonaws.com/v2/documentation/api/latest/reference/glue/get-table.html#output

profile pictureAWS
INGENIERO DE SOPORTE
respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas