By using AWS re:Post, you agree to the Terms of Use

Glue & Snowflake: How to perform SQL transformations

0

I think this may be a stupid question, but I can't figure out how to simply execute SQL raw transformation scripts on a schedule using Glue. My objective is to use Glue to schedule transformations that are all contained in Snowflake (since all the data I need already exists somewhere within Snowflake).

Simplified example of what I'd like to do (insert today's sales data from two separate sales tables into one unified sales table):

INSERT INTO DATABASE2.SCHEMA2.UNIFIED_SALES_TABLE

	SELECT
		DATE
	,	SUM(REVENUE_USD) REVENUE
	
	FROM
		DATABASE1.SCHEMA1.FIRST_SALES_TABLE
	
	WHERE
		DATE = CURRENT_DATE
	
	
	UNION ALL
	
	
	SELECT
		DATE
	,	SUM(USD_REVENUE) REVENUE
	
	FROM
		DATABASE3.SCHEMA3.ANOTHER_SALES_TABLE
		
	WHERE
		DATE = CURRENT_DATE
;

I've set up my Snowflake Connector within Glue, and made a connection using that Connector. I'm able to connect, but I'm looking for a place where I can simply paste my SQL so that it'll run on a schedule. Is that possible? I don't need to do any mappings, I don't need to move any data from source -> destination; I just need to execute the SQL in my Snowflake instance, but I'm not sure how / where I'd do this.

I tried just using the "SQL" transform option, which says "Custom SQL code to execute". That sounds like exactly what I want, but I get the impression it's not quite intended for what I'm trying. One issue in particular that I encountered with it was AnalysisException: The namespace in session catalog must have exactly one name part: if I fully qualified names. Since I'm using different databases and schemas, I have to qualify them in order for the SQL to work.

1 Answer
1
Accepted Answer

Hi,

if your objective is to run purely some ELT , your most optimize cost option would be to use a Glue Python Shell job importing the Snowflake python connector, with this method you can execute your SQL code against Snowflake at the cost of 1/16 of a DPU.

You could also import the Snowflake python connector inside a Glue Spark ETL Job but your job would be mostly idle and you would overspend for the same operation.

The Glue Studio Connector for SnowFlake should work similarly to the Snowflake Connector for Spark. The main goal of this connector is to create a fast exchange of data between Snowflake and Spark, so for writing to Snowflake , it first write to S3 and then uses the Snowflake Copy command. It offer the ability to run some pre and post SQL but you would still need to load something into a staging table.

If you do some transformation in Spark, load the DataFrame to a Snowflake table and then you need to run your Snowflake SQL , the Glue Studio Connector for SnowFlake with a post action would be the best choice.

The Glue Studio SQL transform will implement your code in SparkSQL, and it is currently meant for ETL not ELT.

EXPERT
answered 8 months ago
  • Thank you for clarifying. That is really helpful. I'll try going the Python Shell route.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions