Skip to content

Athena Iceberg table Commit error with Lambda Service

0

I'm trying to update iceberg table with Athena Client in AWS lambda but getting COMMIT error. My query runs about 100 queries at a time, and the time interval for generating a query is within 1 to 2 seconds.

following error: ICEBERG_COMMIT_ERROR: Failed to commit Iceberg update to the table: . If a data manifest file was generated at 's3://bucket_name/path/manifest.csv', you may need to manually clean the data from locations specified in the manifest. Athena will not delete data in your account.

Any idea what is the issue?

asked 2 years ago928 views
1 Answer
1
Accepted Answer

Hello.

I haven't seen the code and query you are using so I don't know the details, but if I delete 's3://bucket_name/path/manifest.csv' as the error message says, will the query run?

If you can share the query and code you are running, could you please do so?

EXPERT
answered 2 years ago
EXPERT
reviewed 2 years ago
EXPERT
reviewed 2 years ago
  • Hello.

    When I checked the "manifest.csv" file after this error, the file did not exist. I also checked that it worked well when I re-run the query.

    It is difficult to share all the codes, but the code in the query part is as follows. code:

    session = boto3.Session()
    athena_client = session.client('athena')
    response = athena_client.start_query_execution(
            QueryString=query_string,
            QueryExecutionContext={
                'Database': database_name
            },
            ResultConfiguration={
                'OutputLocation': s3://bucket_name/path/'
            },
            WorkGroup='group'
        )
    
    if status == 'SUCCEEDED':
            print(f"[{query_execution_id}] Query SUCCEEDED!")
            results = athena_client.get_query_results(QueryExecutionId=query_execution_id)
            return results['ResultSet']['Rows']
    else:
            print(f"[{query_execution_id}] Query failed!")
            return None
    

    I think parallel execution is a problem. Is there any way to fix it?

  • Thank you for sharing the code. I'm not sure if it will lead to a direct solution, but how about fixing the number of concurrent executions of Lambda to 1 as described in the GitHub issue below? https://github.com/aws/aws-sdk-pandas/issues/2651#issuecomment-1955081562
    Also, can I run the query with a smaller number of parallel executions, such as 10 instead of 100?

  • Thank you for your reply. Let's apply the concurrency limit to the issue.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.