AWS Data Wrangler Athena Query Failure

0

I'm attempting to run an Athena query using AWS Data Wrangler which is failing with an error indicating a column cannot be resolved. However when I view the failing query in the Athena console and rerun it there, it succeeds. I have a feeling it has something to do with the formatting with table or column names but I've been unsuccessful finding a work around. When I attempt the same query using vanilla boto3, it also fails with the same error. Here is an example of the code which fails followed by the same successful query ran in the console. (with some names anonymized for security) Adding to the strangeness further, if I remove the joins, it will error indicating the first column in the select cannot be resolved but the same query runs successfully in the console.

Python Code

import awswrangler as wr

query = """
SELECT 
    t1.description, 
    t1.model_number as model, 
    t1.p_id as p_id, 
    t1.created_at as createdAt, 
    t1.updated_at as updatedAt, 
    t3.description as platform, 
    t2.name as module
FROM 
    database_with_underscores.first_table t1 
LEFT JOIN
    database_with_underscores.other_table t2
ON
    t2.id = t1.module_id
LEFT JOIN
    database_with_underscores.lasttable t3
ON  
    t1.platform_id = t3.id
"""

df = wr.athena.read_sql_query(query, workgroup="primary", database="database_with_underscores")

Python Result

awswrangler.exceptions.QueryFailed: SYNTAX_ERROR: line 19:5: Column 't2.id' cannot be resolved

AWS Athena Console

SELECT 
    t1.description, 
    t1.model_number as model, 
    t1.p_id as p_id, 
    t1.created_at as createdAt, 
    t1.updated_at as updatedAt, 
    t3.description as platform, 
    t2.name as module
FROM 
    database_with_underscores.first_table t1 
LEFT JOIN
    database_with_underscores.other_table t2
ON
    t2.id = t1.module_id
LEFT JOIN
    database_with_underscores.lasttable t3
ON  
    t1.platform_id = t3.id

Console Result

Successful

Link to a diff of the execution where you can see the Query is identical but one succeeds and the other fails https://imgur.com/a/V9QSbIn

CoryMcC
asked 2 years ago1621 views
2 Answers
0
Accepted Answer

After much trail and error, I came to find out the true source of the issue was due to a difference in permissions between my console account and CLI account. Adding the select/describe permissions to my CLI account immediately resolved the error. The error message regarding the column not being resolved was misleading in this situation. Further interesting was this did not generate any sort of Unauthorized event in our CloudTrail.

CoryMcC
answered 2 years ago
0

Please review the column names in the respective tables. It looks like the ON condition is not same at both places. Python code is referring t2.id while console query refers t1.id and same anomaly for the module_id column.

Python Code:

t1 LEFT JOIN t2
ON t2.id = t1.module_id

Athena Console Query:

t1 LEFT JOIN t2
ON t1.id = t2.module_id```
profile pictureAWS
awsanil
answered 2 years ago
  • Ah yes, that was a bad copy/paste. In the source code prior to me anonymizing it, they're both identical yet the same issues persists. The both are in fact t2.id = t1.module_id Also for further clarification, when the query fails via Python, I go the console, click the failed query to load it and hit run without edits and it succeeds. I'm not actually copying/pasting anything between the two systems.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions