Athena query to a native Delta table fails with internal error after 15 minutes of execution. What's wrong?

0

Query:

SELECT *
FROM delta_prod.behavioral_logs
LIMIT 10

This query fails with

Amazon Athena experienced an internal error while executing this query. Please try submitting the query again and if the issue reoccurs, contact AWS support for further assistance. We apologize for the inconvenience.
This query ran against the "dbt" database, unless qualified by the query. Please post the error message on our forum  or contact customer support  with Query Id: 1bc74725-3586-4f63-b6c9-e9c7bb6012f3

delta_prod.behavioral_logs is a Delta table registered in AWS Glue. How can I find out what is wrong with it?

asked 7 months ago501 views
9 Answers
0

Dear Customer,

To further investigate this issue internally, please provide me with the region you were running this query in. For example: eu-west-1.

I look forward to your response to go ahead with investigating the issue. I am here to assist you.

profile pictureAWS
Muzi_M
answered 7 months ago
0

Dear Customer,

Thank you for providing your region as requested. I was able to access the query logs in the backend. I confirmed the error below:

-===- Amazon Athena experienced an internal error while executing this query. Please try submitting the query again and if the issue reoccurs, contact AWS support for further assistance. We apologize for the inconvenience. -===-

Please note that this error occurs when there is a transient issue with the warm pool of resources where the query is executed. Such issues are usually short-lived and one of the subsequent retries of such a query should succeed. Athena being a managed service, makes use of a shared pool of resources to run queries. These resources may fail at times due to peak load or internal issues.

In these scenarios, retry of the query should work. Please update me if you continue to receive the same error. I have escalated this issue to our internal team for further investigation. I will update you on their findings. Note: I have no ETA for the service response.

I look forward to your response.

profile pictureAWS
Muzi_M
answered 7 months ago
  • Hey Muzi. The problem doesn't go away, I tried multiple times. The latest error (us-east-1):

    Amazon Athena experienced an internal error while executing this query. Please try submitting the query again and if the issue reoccurs, contact AWS support for further assistance. We apologize for the inconvenience.
    This query ran against the "dbt_prod" database, unless qualified by the query. Please post the error message on our forum  or contact customer support  with Query Id: 40b964f5-8d14-4467-a3b1-05d50cbe20b6
    
0

Dear Customer,

Thank you for reaching back. I understand that you tried re-running the query a couple of times, but the issue is persistent.

Please note that I have reached out to the Athena service team to investigate this issue and provide more insights. I don’t have an ETA from the service team, but I can assure you that as soon as they update me, I will swiftly revert to you.

I appreciate your patience and cooperation as we get to the bottom of this issue. We are here to assist you.

profile pictureAWS
Muzi_M
answered 7 months ago
  • Thanks, I'll be waiting for the answer.

0

Hello once again,

Thank you for your patience while I was engaging the Athena service team on this issue. I have an update from service team as below:

It is stated in this AWS Documentation here[1] that: -==- Because Delta Lake metadata is stored in the file system (for example, in Amazon S3) rather than in the metastore, Athena requires only the location property in AWS Glue to read from your Delta Lake tables. -==-

With that said please create your table again using one of the methods described:

  • Using 'table_type' = ' DELTA' and provide no metadata about columns[2]
  • Using 'spark.sql.sources.provider' = 'delta' and single col of type array<string> and then provide the path for the serde in Glue console.[3]

Please try create your table with respect to the above docs and query your table again. If there is anything that is not clear or you still facing any issue, please reach back to me here. I will be happy to address those issues.

reference: [1] https://docs.aws.amazon.com/athena/latest/ug/delta-lake-tables.html [2] https://docs.aws.amazon.com/athena/latest/ug/delta-lake-tables.html#delta-lake-tables-getting-started [3] https://docs.aws.amazon.com/athena/latest/ug/delta-lake-tables.html#delta-lake-tables-syncing-metadata

profile pictureAWS
Muzi_M
answered 7 months ago
  • With Apache Spark I created a table delta_dev. temp_behavioral_logs_2.

    • It has 'spark.sql.sources.provider' = 'delta'
    • It has a single col of type array<string> in AWS Glue

    I run the SQL query

    SELECT *
    FROM delta_dev.temp_behavioral_logs_2
    LIMIT 10
    

    This leads to error (again):

    Amazon Athena experienced an internal error while executing this query. Please try submitting the query again and if the issue reoccurs, contact AWS support for further assistance. We apologize for the inconvenience.
    This query ran against the "delta_dev" database, unless qualified by the query. Please post the error message on our forum  or contact customer support  with Query Id: 4387861c-4fb7-45ee-a0fe-dec366920a00
    

    Can you tell me why this internal error happens? What I need to do to solve it and be able to read Spark-written partitioned delta tables from Athena?

0

Hello,

I apologise for the delayed response. When doing an investigation internally using your recent Query Id: 4387861c-4fb7-45ee-a0fe-dec366920a00, it was discovered that there is no "table_type".

When creating your table, did you run something like below to create a Delta Lake table?

-==- CREATE EXTERNAL TABLE delta_dev.sample_delta_table_v2 LOCATION 's3://constructor-analytics-data/tables/delta_dev/behavioral_logs' TBLPROPERTIES ('table_type' = 'DELTA') -==-

Alternatively, I request that you run Crawler to create your delta lake table. The steps to do so are detailed in this doc here https://aws.amazon.com/blogs/big-data/crawl-delta-lake-tables-using-aws-glue-crawlers/

To answer your question: Can you tell please me why this internal error happens?

Answer: At this moment, this error could be caused by missing some properties when creating the delta lake table.

If you still face any difficulties, don't hesitate to reach back. I am here to assist you.

profile pictureAWS
Muzi_M
answered 7 months ago
0

Hey Muzi. First of all, please note that the issue I face is only related to partitioned delta tables, non-partitioned tables are queried fine by Athena. Why Athena doesn't give a concrete error message and instead responds with "internal error" response? Is there something in the logs on your end? I tried the following:

Part 1. Step 1

In this answer (link) you mentioned that Using 'spark.sql.sources.provider' = 'delta' and single col of type array<string> and then provide the path for the serde in Glue console

Delta table delta_dev.temp_behavioral_logs_3 are created by Databricks, it uses AWS Glue as metastore. When the table is created, it indeed specifies 'spark.sql.sources.provider' = 'delta' and creates a single col of type array<string>. And still, the Athena query fails after 15 minutes with internal error (4387861c-4fb7-45ee-a0fe-dec366920a00 as mentioned previously, again in 189aa17e-826c-4573-885b-8b3a85eb594c).

Part 1. Step 2

I manually edited the properties of delta_dev.temp_behavioral_logs_3 and added 'table_type' = 'DELTA' . The same result, query fails with the same internal error: 7af24625-90f9-4afe-9ea2-bfcf3173c0a2

Amazon Athena experienced an internal error while executing this query. Please try submitting the query again and if the issue reoccurs, contact AWS support for further assistance. We apologize for the inconvenience. This query ran against the "delta_dev" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 7af24625-90f9-4afe-9ea2-bfcf3173c0a2

Part 2

I tried another approach mentioned by you.

CREATE EXTERNAL TABLE delta_dev.temp_behavioral_logs_athena_1 
LOCATION 's3://constructor-analytics-data/tables/delta_dev/behavioral_logs' 
TBLPROPERTIES ('table_type' = 'DELTA');

This query timeouts, but after some time the table delta_dev.temp_behavioral_logs_athena_1 appears in Glue with 'table_type' = 'DELTA' property, but it has no columns.

SELECT *
FROM delta_dev.temp_behavioral_logs_athena_1
LIMIT 10

COLUMN_NOT_FOUND: line 1:8: SELECT * not allowed from relation that has no columns This query ran against the "delta_dev" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 233d7215-372d-44f5-8fd1-67d525ef5e6c

answered 7 months ago
0

Hello,

Thank you for your patience while I was engaging the service team and please accept my apologies for any inconvenience this issue may have caused.

I understand that the Delta Lake table fails to get queried in any manner the table is created.

Can you please try the below, if not tried:

  1. Open the AWS Glue console at https://console.aws.amazon.com/glue/.
  2. In the navigation pane, choose Data Catalog, Tables.
  3. In the list of tables, choose the link for your table.
  4. On the page for the table, choose Actions, Edit table.
  5. In the Serde parameters section, add the key path with the value s3://DOC-EXAMPLE-BUCKET/your-folder/.
  6. Choose Save. Try this and check if the columns are loaded.

If this does not work, for test, run Glue Crawler and mention Delta table. This should show the columns there. Because if there is data under the bucket the table should be queried

Also, the col which is being added, is the data at the backend in the same manner as defined in the Create Table statement ?

I am looking forward to your response. If you have any further queries or concerns, please feel free to update me through the case. I'll be more than happy to help you.

profile pictureAWS
Muzi_M
answered 6 months ago
0

Hey Muzi. The delta_dev.temp_behavioral_logs_3 already had path specified. The delta_dev.temp_behavioral_logs_athena_1 didn't, I specified it but it didn't work. But it's weird that delta_dev.temp_behavioral_logs_athena_1 lacks any settings

Enter image description here

answered 6 months ago
0

Hello,

I have replicated your use case on my end following this blog [1].

I have created a partitioned delta lake dataset in S3 using this same blog [1]. While this blog creates a non-partitioned Delta lake dataset, can I modified the below code snippet as follows to create a partitioned delta lake dataset( I Only ran "full_load" Glue job. No need to run "cdc_load" job in this blog.)

-==-

Write data as DELTA TABLE

sdf.write.format("delta").mode("overwrite").partitionBy("region_name"). save("s3://"+ args['s3_bucket']+"/delta/insurance/") -==-

With the modified code snippet, I was able to successfully query the partitioned delta lake table in Athena. I created the table in both ways, i.e., manually using Athena and through Glue Crawler, and it works in both scenarios.

So, at this point, I need to know the steps you followed to generate the partitioned delta lake dataset in S3 to reproduce the issue. May you also please follow the steps from the blog[1] and see if they cannot improve the situation.

I hope you find the above information helpful to address your issue. If you have any further queries or concerns, please feel free to update me through the case. I'll be more than happy to help you.

Reference: [1] https://aws.amazon.com/blogs/big-data/handle-upsert-data-operations-using-open-source-delta-lake-and-aws-glue/

profile pictureAWS
Muzi_M
answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions