1 Respuesta
- Más nuevo
- Más votos
- Más comentarios
0
I believe the attachFilename doesn't work for all formats.
Try using format="glueparquet", otherwise you could use the first example but reading directly with DataFrame (and then convert to DynamicFrame if you want to):
newdf = spark.read.parquet(s3://bucket/)
newdf = newdf.withColumn('filename2', input_file_name())
Contenido relevante
- OFICIAL DE AWSActualizada hace 3 años
- OFICIAL DE AWSActualizada hace un año
But if i read through spark directly, would i be still able to use bookmark ?
In the documentation, it says that attachFilename can be used with any format, I've tested it with CSV and it works. Do you have an idea why for Parquet it's different?
This is the link to the section talking about attachFIlename option: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format.html#aws-glue-programming-etl-format-shared-reference