Lake Formation blueprint for database ingestion fail with SPARK-31404

0

Hello,

I'am trying to run a Lake Formation blueprint for database ingestion (Aurora PostgreSQL, Glue Connection working, snapshot mode), but I got the following error:

An error occurred while calling o471.pyWriteDynamicFrame. You may get a different result due to the upgrading of Spark 3.0: writing dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z into Parquet INT96 files can be dangerous, as the files may be read by Spark 2.x or legacy versions of Hive later, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set spark.sql.legacy.parquet.int96RebaseModeInWrite to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during writing, to get maximum interoperability. Or set spark.sql.legacy.parquet.int96RebaseModeInWrite to 'CORRECTED' to write the datetime values as it is, if you are 100% sure that the written files will only be read by Spark 3.0+ or other systems that use Proleptic Gregorian calendar.

I've found that adding --conf spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED would solve. However, it is not possible to change the Glue ETL Job (got putObject: AccessDenied: Access Denied).

demandé il y a un an260 vues
1 réponse
0

Hello,

I understand that you are trying to update a Glue job with parameter --conf spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED, however you are getting the following error

"putObject: AccessDenied: Access Denied"

This error is generally seen when the IAM user/role used to edit the job does not have permissions to upload to the S3 bucket containing the script. Ideally we will need to have the permissions specified in the following document for the user that is used while editing the job https://docs.aws.amazon.com/glue/latest/dg/attach-policy-iam-user.html

For the above error specifically, we will need to allow s3:PutObject permission on the "arn:aws:s3:::<scriptBucketName>/* in the IAM policy.

AWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
répondu il y a un an

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions