In AWS Glue not able to add the source file name in the dynamic frame

0

I tried to add through 2 ways

    newdf = newdf.withColumn('filename2', input_file_name())

also tried

AmazonS3_node = glueContext.create_dynamic_frame.from_options(
    connection_type="s3",
    format="parquet",
    connection_options={"paths": [ "s3://bucket/"], "recurse": True},
    transformation_ctx="AmazonS3_node",
    format_options={
        "attachFilename": "filename"
    }
)

both of them dont work also in 2nd case the column is not created

Please help with this

KG
feita há 4 meses447 visualizações
1 Resposta
0

I believe the attachFilename doesn't work for all formats.
Try using format="glueparquet", otherwise you could use the first example but reading directly with DataFrame (and then convert to DynamicFrame if you want to):

newdf = spark.read.parquet(s3://bucket/)
newdf = newdf.withColumn('filename2', input_file_name())
profile pictureAWS
ESPECIALISTA
respondido há 4 meses

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas