Saving results of a query which uses an external function into a new table in Athena v.3

2

I'm following this tutorial to perform text analytics using UDFs. https://aws.amazon.com/blogs/machine-learning/translate-and-analyze-text-using-sql-functions-with-amazon-athena-amazon-translate-and-amazon-comprehend/

The example where data are not saved works fine work fine in Athena v.3 USING EXTERNAL FUNCTION detect_sentiment_all(col1 VARCHAR, lang VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf', EXTERNAL FUNCTION detect_entities_all(col1 VARCHAR, lang VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf' SELECT *, detect_sentiment_all(review_body, language) AS sentiment, detect_entities_all(review_body, language) AS entities FROM amazon_reviews_with_language WHERE language IN ('ar', 'hi', 'ko', 'zh-TW', 'ja', 'zh', 'de', 'pt', 'en', 'it', 'fr', 'es')

However, the example where a table is created to save the results does not work in Athena v.3, but it works in Athena v.2.

CREATE TABLE amazon_reviews_with_text_analysis WITH (format='parquet') AS USING EXTERNAL FUNCTION detect_sentiment_all(col1 VARCHAR, lang VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf', EXTERNAL FUNCTION detect_entities_all(col1 VARCHAR, lang VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf' SELECT *, detect_sentiment_all(review_body, language) AS sentiment, detect_entities_all(review_body, language) AS entities FROM amazon_reviews_with_language WHERE language IN ('ar', 'hi', 'ko', 'zh-TW', 'ja', 'zh', 'de', 'pt', 'en', 'it', 'fr', 'es')

It fails with an error: line 2:1: mismatched input 'USING'. Expecting: '(', <query>

The question is how can I save data into a table when using external function in Athena v.3?

MikeW
asked a year ago342 views
1 Answer
0

Hello,

Please put the 'USING' clause before the 'CREATE TABLE'

Please ensure that the database has a location associated in AWS Glue.

For example:

USING EXTERNAL FUNCTION detect_dominant_language(col1 VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf'

CREATE TABLE amazon_reviews_with_language WITH ( format='parquet' ) AS

SELECT *, detect_dominant_language(review_body) AS language

FROM amazon_reviews_parquet

LIMIT 5000

References:

https://docs.aws.amazon.com/athena/latest/ug/create-table.html

https://docs.amazonaws.cn/en_us/athena/latest/ug/create-table-as.html

https://docs.aws.amazon.com/athena/latest/ug/alter-table-set-location.html

AWS
SUPPORT ENGINEER
answered a year ago
  • Thanks, mate. Appreciate the suggestion. I'm using the database that was created previously. The location was not set in the Glue Catalog, so I set it. Then I moved 'USING' clause before 'CREATE TABLE' and now I get the following error: GENERIC_INTERNAL_ERROR: Can not create a Path from an empty string

    I also tried adding a location in the 'CREATE TABLE' statement like this:

    USING EXTERNAL FUNCTION detect_dominant_language(col1 VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf' CREATE TABLE amazon_reviews_with_language WITH ( format='parquet' ) LOCATION 's3://my-bucket/db_path/amazon_reviews_with_language' AS SELECT *, detect_dominant_language(review_body) AS language FROM amazon_reviews_parquet LIMIT 5000

    But it is not working in Athena v.3: Error: mismatched input 'LOCATION'. Expecting: '%', ')', '*', '+', ',', '-', '.', '/', 'AND', 'AS', 'AT', 'OR', '[', '||', <EOF>, <predicate>

  • I'm also getting errors in Engine V3 with a Lambda UDF + CTAS statement. My current workaround is to revert to Engine V2 and place USING EXTERNAL FUNCTION after CREATE TABLE AS.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions