In AWS Glue not able to add the source file name in the dynamic frame

0

I tried to add through 2 ways

    newdf = newdf.withColumn('filename2', input_file_name())

also tried

AmazonS3_node = glueContext.create_dynamic_frame.from_options(
    connection_type="s3",
    format="parquet",
    connection_options={"paths": [ "s3://bucket/"], "recurse": True},
    transformation_ctx="AmazonS3_node",
    format_options={
        "attachFilename": "filename"
    }
)

both of them dont work also in 2nd case the column is not created

Please help with this

KG
preguntada hace 4 meses447 visualizaciones
1 Respuesta
0

I believe the attachFilename doesn't work for all formats.
Try using format="glueparquet", otherwise you could use the first example but reading directly with DataFrame (and then convert to DynamicFrame if you want to):

newdf = spark.read.parquet(s3://bucket/)
newdf = newdf.withColumn('filename2', input_file_name())
profile pictureAWS
EXPERTO
respondido hace 4 meses

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas