AWS Data Wrangler Athena Query Failure

0

I'm attempting to run an Athena query using AWS Data Wrangler which is failing with an error indicating a column cannot be resolved. However when I view the failing query in the Athena console and rerun it there, it succeeds. I have a feeling it has something to do with the formatting with table or column names but I've been unsuccessful finding a work around. When I attempt the same query using vanilla boto3, it also fails with the same error. Here is an example of the code which fails followed by the same successful query ran in the console. (with some names anonymized for security) Adding to the strangeness further, if I remove the joins, it will error indicating the first column in the select cannot be resolved but the same query runs successfully in the console.

Python Code

import awswrangler as wr

query = """
SELECT 
    t1.description, 
    t1.model_number as model, 
    t1.p_id as p_id, 
    t1.created_at as createdAt, 
    t1.updated_at as updatedAt, 
    t3.description as platform, 
    t2.name as module
FROM 
    database_with_underscores.first_table t1 
LEFT JOIN
    database_with_underscores.other_table t2
ON
    t2.id = t1.module_id
LEFT JOIN
    database_with_underscores.lasttable t3
ON  
    t1.platform_id = t3.id
"""

df = wr.athena.read_sql_query(query, workgroup="primary", database="database_with_underscores")

Python Result

awswrangler.exceptions.QueryFailed: SYNTAX_ERROR: line 19:5: Column 't2.id' cannot be resolved

AWS Athena Console

SELECT 
    t1.description, 
    t1.model_number as model, 
    t1.p_id as p_id, 
    t1.created_at as createdAt, 
    t1.updated_at as updatedAt, 
    t3.description as platform, 
    t2.name as module
FROM 
    database_with_underscores.first_table t1 
LEFT JOIN
    database_with_underscores.other_table t2
ON
    t2.id = t1.module_id
LEFT JOIN
    database_with_underscores.lasttable t3
ON  
    t1.platform_id = t3.id

Console Result

Successful

Link to a diff of the execution where you can see the Query is identical but one succeeds and the other fails https://imgur.com/a/V9QSbIn

CoryMcC
已提問 2 年前檢視次數 1622 次
2 個答案
0
已接受的答案

After much trail and error, I came to find out the true source of the issue was due to a difference in permissions between my console account and CLI account. Adding the select/describe permissions to my CLI account immediately resolved the error. The error message regarding the column not being resolved was misleading in this situation. Further interesting was this did not generate any sort of Unauthorized event in our CloudTrail.

CoryMcC
已回答 2 年前
0

Please review the column names in the respective tables. It looks like the ON condition is not same at both places. Python code is referring t2.id while console query refers t1.id and same anomaly for the module_id column.

Python Code:

t1 LEFT JOIN t2
ON t2.id = t1.module_id

Athena Console Query:

t1 LEFT JOIN t2
ON t1.id = t2.module_id```
profile pictureAWS
awsanil
已回答 2 年前
  • Ah yes, that was a bad copy/paste. In the source code prior to me anonymizing it, they're both identical yet the same issues persists. The both are in fact t2.id = t1.module_id Also for further clarification, when the query fails via Python, I go the console, click the failed query to load it and hit run without edits and it succeeds. I'm not actually copying/pasting anything between the two systems.

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南