1 個回答
- 最新
- 最多得票
- 最多評論
0
I believe the attachFilename doesn't work for all formats.
Try using format="glueparquet", otherwise you could use the first example but reading directly with DataFrame (and then convert to DynamicFrame if you want to):
newdf = spark.read.parquet(s3://bucket/)
newdf = newdf.withColumn('filename2', input_file_name())
相關內容
- AWS 官方已更新 3 年前
But if i read through spark directly, would i be still able to use bookmark ?
In the documentation, it says that attachFilename can be used with any format, I've tested it with CSV and it works. Do you have an idea why for Parquet it's different?
This is the link to the section talking about attachFIlename option: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format.html#aws-glue-programming-etl-format-shared-reference