I am currently using a Glue job to read data from one Amazon S3 source, perform some transformations and write the transformed data into another S3 bucket in parquet format. While writing data to the destination bucket, I am adding a partitioning on one of the field.
I am using below code to write data to destination:
partition_keys = ["partition_date"]
glueContext.write_dynamic_frame.from_catalog(
frame=dynamic_frame,
database=glue_catalog_db,
table_name=glue_catalog_dest_table,
transformation_ctx="write_dynamic_frame",
additional_options={"partitionKeys": partition_keys}
)
Right now, I am observing below error message in the logs:
WARN TaskSetManager: Lost task 342.0 in stage 0.0 (TID 343) (172.35.6.249 executor 10): org.apache.spark.SparkException: Task failed while writing rows.
Caused by: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Please reduce your request rate. (Service: Amazon S3; Status Code: 503; Error Code: SlowDown;
I just wanted to know if we can ignore these warnings as of now. As in will there be any data loss which I might face due to this issue, or these errors/warnings are auto retriable.
In case of data loss, What will be the best solution to avoid this issue?
Note: Number of files to be written into Destination S3 bucket are in billion.