In AWS Glue not able to add the source file name in the dynamic frame

0

I tried to add through 2 ways

    newdf = newdf.withColumn('filename2', input_file_name())

also tried

AmazonS3_node = glueContext.create_dynamic_frame.from_options(
    connection_type="s3",
    format="parquet",
    connection_options={"paths": [ "s3://bucket/"], "recurse": True},
    transformation_ctx="AmazonS3_node",
    format_options={
        "attachFilename": "filename"
    }
)

both of them dont work also in 2nd case the column is not created

Please help with this

KG
gefragt vor 4 Monaten447 Aufrufe
1 Antwort
0

I believe the attachFilename doesn't work for all formats.
Try using format="glueparquet", otherwise you could use the first example but reading directly with DataFrame (and then convert to DynamicFrame if you want to):

newdf = spark.read.parquet(s3://bucket/)
newdf = newdf.withColumn('filename2', input_file_name())
profile pictureAWS
EXPERTE
beantwortet vor 4 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen