1 réponse
- Le plus récent
- Le plus de votes
- La plupart des commentaires
0
I believe the attachFilename doesn't work for all formats.
Try using format="glueparquet", otherwise you could use the first example but reading directly with DataFrame (and then convert to DynamicFrame if you want to):
newdf = spark.read.parquet(s3://bucket/)
newdf = newdf.withColumn('filename2', input_file_name())
Contenus pertinents
- demandé il y a 6 mois
- demandé il y a un an
- demandé il y a un an
- AWS OFFICIELA mis à jour il y a 2 ans
- AWS OFFICIELA mis à jour il y a un an
- AWS OFFICIELA mis à jour il y a un an
But if i read through spark directly, would i be still able to use bookmark ?
In the documentation, it says that attachFilename can be used with any format, I've tested it with CSV and it works. Do you have an idea why for Parquet it's different?
This is the link to the section talking about attachFIlename option: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format.html#aws-glue-programming-etl-format-shared-reference