HIVE_WRITER_CLOSE_ERROR: Error committing write parquet to Hive. Transient Error.

0

I'm getting an error with a CTAS iceberg parquet table, with the error message as "HIVE_WRITER_CLOSE_ERROR: Error committing write parquet to Hive."

However, once I delete the orphaned files in S3 and then re-run the CTAS query it works successfully. I'm trying to understand the reason why the query is failing transiently?

Example: create table example_schema.example_table with ( table_type='iceberg', is_external=false, location='s3://example_path', partitioning=ARRAY['ingest_timestamp'], format='parquet' ) as

SELECT * FROM example_schema.example_source_table

asked a year ago705 views
1 Answer
0

Short description:

The above error usually occurs when your query is failing with "AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: xxxxxxxxxxx) " error.

The error code 503 Slow Down typically indicates that the number of requests to your S3 bucket is very high. For example, you can send 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in an S3 bucket. However, in some cases, Amazon S3 can return a 503 Slow Down response if your requests exceed the amount of bandwidth available.

This explains the transient nature of this query failing at times and succeeding at other times. Please note that this error comes not only from calls made by Athena to the buckets being read/written to, but other AWS services requesting/writing to these buckets.

By default, S3 will scale automatically to support very high request rates. When your request rate is scaling, S3 automatically partitions your S3 bucket as needed to support higher request rates. This may be the reason why your query succeeds after some attempts.

However, if S3 is being overwhelmed by a high request volume you would start to see 5XX errors, requesting you to slow down / try later. Considering the distributed nature of the S3 service these errors are sometimes expected but rarely generated.

Workarounds:

[1] You can retry the query after some time with an exponential back-off to give S3 enough time to partition your bucket further based on your key space design. During this time, errors can still occur but it is best to retry while S3 auto-partitions your bucket based on the request rate it is receiving. [2] Configure your application to gradually increase request rates [3] Distribute objects across multiple prefixes [4] Monitor the number of 5xx status error responses

You can refer the detailed workarounds outlined here- https://repost.aws/knowledge-center/http-5xx-errors-s3

Additional troubleshooting:

If you continue to see same error message while querying after following the workarounds, you can contact AWS Support https://support.console.aws.amazon.com/support/home#/case/create?issueType=technical Include the query id with the error message.

AWS
SUPPORT ENGINEER
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions