In AWS Glue not able to add the source file name in the dynamic frame

0

I tried to add through 2 ways

    newdf = newdf.withColumn('filename2', input_file_name())

also tried

AmazonS3_node = glueContext.create_dynamic_frame.from_options(
    connection_type="s3",
    format="parquet",
    connection_options={"paths": [ "s3://bucket/"], "recurse": True},
    transformation_ctx="AmazonS3_node",
    format_options={
        "attachFilename": "filename"
    }
)

both of them dont work also in 2nd case the column is not created

Please help with this

KG
질문됨 4달 전447회 조회
1개 답변
0

I believe the attachFilename doesn't work for all formats.
Try using format="glueparquet", otherwise you could use the first example but reading directly with DataFrame (and then convert to DynamicFrame if you want to):

newdf = spark.read.parquet(s3://bucket/)
newdf = newdf.withColumn('filename2', input_file_name())
profile pictureAWS
전문가
답변함 4달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠