- Le plus récent
- Le plus de votes
- La plupart des commentaires
Dear Customer,
To further investigate this issue internally, please provide me with the region you were running this query in. For example: eu-west-1.
I look forward to your response to go ahead with investigating the issue. I am here to assist you.
Dear Customer,
Thank you for providing your region as requested. I was able to access the query logs in the backend. I confirmed the error below:
-===- Amazon Athena experienced an internal error while executing this query. Please try submitting the query again and if the issue reoccurs, contact AWS support for further assistance. We apologize for the inconvenience. -===-
Please note that this error occurs when there is a transient issue with the warm pool of resources where the query is executed. Such issues are usually short-lived and one of the subsequent retries of such a query should succeed. Athena being a managed service, makes use of a shared pool of resources to run queries. These resources may fail at times due to peak load or internal issues.
In these scenarios, retry of the query should work. Please update me if you continue to receive the same error. I have escalated this issue to our internal team for further investigation. I will update you on their findings. Note: I have no ETA for the service response.
I look forward to your response.
Hey Muzi. The problem doesn't go away, I tried multiple times. The latest error (us-east-1):
Amazon Athena experienced an internal error while executing this query. Please try submitting the query again and if the issue reoccurs, contact AWS support for further assistance. We apologize for the inconvenience. This query ran against the "dbt_prod" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 40b964f5-8d14-4467-a3b1-05d50cbe20b6
Dear Customer,
Thank you for reaching back. I understand that you tried re-running the query a couple of times, but the issue is persistent.
Please note that I have reached out to the Athena service team to investigate this issue and provide more insights. I don’t have an ETA from the service team, but I can assure you that as soon as they update me, I will swiftly revert to you.
I appreciate your patience and cooperation as we get to the bottom of this issue. We are here to assist you.
Thanks, I'll be waiting for the answer.
Hello once again,
Thank you for your patience while I was engaging the Athena service team on this issue. I have an update from service team as below:
It is stated in this AWS Documentation here[1] that: -==- Because Delta Lake metadata is stored in the file system (for example, in Amazon S3) rather than in the metastore, Athena requires only the location property in AWS Glue to read from your Delta Lake tables. -==-
With that said please create your table again using one of the methods described:
- Using 'table_type' = ' DELTA' and provide no metadata about columns[2]
- Using 'spark.sql.sources.provider' = 'delta' and single col of type array<string> and then provide the path for the serde in Glue console.[3]
Please try create your table with respect to the above docs and query your table again. If there is anything that is not clear or you still facing any issue, please reach back to me here. I will be happy to address those issues.
reference: [1] https://docs.aws.amazon.com/athena/latest/ug/delta-lake-tables.html [2] https://docs.aws.amazon.com/athena/latest/ug/delta-lake-tables.html#delta-lake-tables-getting-started [3] https://docs.aws.amazon.com/athena/latest/ug/delta-lake-tables.html#delta-lake-tables-syncing-metadata
With Apache Spark I created a table delta_dev. temp_behavioral_logs_2.
- It has 'spark.sql.sources.provider' = 'delta'
- It has a single col of type array<string> in AWS Glue
I run the SQL query
SELECT * FROM delta_dev.temp_behavioral_logs_2 LIMIT 10
This leads to error (again):
Amazon Athena experienced an internal error while executing this query. Please try submitting the query again and if the issue reoccurs, contact AWS support for further assistance. We apologize for the inconvenience. This query ran against the "delta_dev" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 4387861c-4fb7-45ee-a0fe-dec366920a00
Can you tell me why this internal error happens? What I need to do to solve it and be able to read Spark-written partitioned delta tables from Athena?
Hello,
I apologise for the delayed response. When doing an investigation internally using your recent Query Id: 4387861c-4fb7-45ee-a0fe-dec366920a00, it was discovered that there is no "table_type".
When creating your table, did you run something like below to create a Delta Lake table?
-==- CREATE EXTERNAL TABLE delta_dev.sample_delta_table_v2 LOCATION 's3://constructor-analytics-data/tables/delta_dev/behavioral_logs' TBLPROPERTIES ('table_type' = 'DELTA') -==-
Alternatively, I request that you run Crawler to create your delta lake table. The steps to do so are detailed in this doc here https://aws.amazon.com/blogs/big-data/crawl-delta-lake-tables-using-aws-glue-crawlers/
To answer your question: Can you tell please me why this internal error happens?
Answer: At this moment, this error could be caused by missing some properties when creating the delta lake table.
If you still face any difficulties, don't hesitate to reach back. I am here to assist you.
Hey Muzi. First of all, please note that the issue I face is only related to partitioned delta tables, non-partitioned tables are queried fine by Athena. Why Athena doesn't give a concrete error message and instead responds with "internal error" response? Is there something in the logs on your end? I tried the following:
Part 1. Step 1
In this answer (link) you mentioned that Using 'spark.sql.sources.provider' = 'delta' and single col of type array<string> and then provide the path for the serde in Glue console
Delta table delta_dev.temp_behavioral_logs_3
are created by Databricks, it uses AWS Glue as metastore. When the table is created, it indeed specifies 'spark.sql.sources.provider' = 'delta'
and creates a single col of type array<string>
. And still, the Athena query fails after 15 minutes with internal error (4387861c-4fb7-45ee-a0fe-dec366920a00 as mentioned previously, again in 189aa17e-826c-4573-885b-8b3a85eb594c).
Part 1. Step 2
I manually edited the properties of delta_dev.temp_behavioral_logs_3
and added 'table_type' = 'DELTA' . The same result, query fails with the same internal error: 7af24625-90f9-4afe-9ea2-bfcf3173c0a2
Amazon Athena experienced an internal error while executing this query. Please try submitting the query again and if the issue reoccurs, contact AWS support for further assistance. We apologize for the inconvenience. This query ran against the "delta_dev" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 7af24625-90f9-4afe-9ea2-bfcf3173c0a2
Part 2
I tried another approach mentioned by you.
CREATE EXTERNAL TABLE delta_dev.temp_behavioral_logs_athena_1
LOCATION 's3://constructor-analytics-data/tables/delta_dev/behavioral_logs'
TBLPROPERTIES ('table_type' = 'DELTA');
This query timeouts, but after some time the table delta_dev.temp_behavioral_logs_athena_1
appears in Glue with 'table_type' = 'DELTA' property, but it has no columns.
SELECT *
FROM delta_dev.temp_behavioral_logs_athena_1
LIMIT 10
COLUMN_NOT_FOUND: line 1:8: SELECT * not allowed from relation that has no columns This query ran against the "delta_dev" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 233d7215-372d-44f5-8fd1-67d525ef5e6c
Hello,
Thank you for your patience while I was engaging the service team and please accept my apologies for any inconvenience this issue may have caused.
I understand that the Delta Lake table fails to get queried in any manner the table is created.
Can you please try the below, if not tried:
- Open the AWS Glue console at https://console.aws.amazon.com/glue/.
- In the navigation pane, choose Data Catalog, Tables.
- In the list of tables, choose the link for your table.
- On the page for the table, choose Actions, Edit table.
- In the Serde parameters section, add the key path with the value s3://DOC-EXAMPLE-BUCKET/your-folder/.
- Choose Save. Try this and check if the columns are loaded.
If this does not work, for test, run Glue Crawler and mention Delta table. This should show the columns there. Because if there is data under the bucket the table should be queried
Also, the col which is being added, is the data at the backend in the same manner as defined in the Create Table statement ?
I am looking forward to your response. If you have any further queries or concerns, please feel free to update me through the case. I'll be more than happy to help you.
Hey Muzi. The delta_dev.temp_behavioral_logs_3
already had path
specified. The delta_dev.temp_behavioral_logs_athena_1
didn't, I specified it but it didn't work. But it's weird that delta_dev.temp_behavioral_logs_athena_1
lacks any settings
Hello,
I have replicated your use case on my end following this blog [1].
I have created a partitioned delta lake dataset in S3 using this same blog [1]. While this blog creates a non-partitioned Delta lake dataset, can I modified the below code snippet as follows to create a partitioned delta lake dataset( I Only ran "full_load" Glue job. No need to run "cdc_load" job in this blog.)
-==-
Write data as DELTA TABLE
sdf.write.format("delta").mode("overwrite").partitionBy("region_name"). save("s3://"+ args['s3_bucket']+"/delta/insurance/") -==-
With the modified code snippet, I was able to successfully query the partitioned delta lake table in Athena. I created the table in both ways, i.e., manually using Athena and through Glue Crawler, and it works in both scenarios.
So, at this point, I need to know the steps you followed to generate the partitioned delta lake dataset in S3 to reproduce the issue. May you also please follow the steps from the blog[1] and see if they cannot improve the situation.
I hope you find the above information helpful to address your issue. If you have any further queries or concerns, please feel free to update me through the case. I'll be more than happy to help you.
Reference: [1] https://aws.amazon.com/blogs/big-data/handle-upsert-data-operations-using-open-source-delta-lake-and-aws-glue/
Contenus pertinents
- Réponse acceptéedemandé il y a un an
- demandé il y a 2 mois
- AWS OFFICIELA mis à jour il y a 3 ans
- AWS OFFICIELA mis à jour il y a 3 ans
- AWS OFFICIELA mis à jour il y a un an
It's us-east-1