1 Answer
- Newest
- Most votes
- Most comments
0
I believe the attachFilename doesn't work for all formats.
Try using format="glueparquet", otherwise you could use the first example but reading directly with DataFrame (and then convert to DynamicFrame if you want to):
newdf = spark.read.parquet(s3://bucket/)
newdf = newdf.withColumn('filename2', input_file_name())
Relevant content
- Accepted Answerasked 3 months ago
- Accepted Answerasked 3 years ago
- Accepted Answerasked 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 4 months ago
- AWS OFFICIALUpdated a year ago
But if i read through spark directly, would i be still able to use bookmark ?
In the documentation, it says that attachFilename can be used with any format, I've tested it with CSV and it works. Do you have an idea why for Parquet it's different?
This is the link to the section talking about attachFIlename option: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format.html#aws-glue-programming-etl-format-shared-reference