Creating a Delta table from Spark using the Glue Catalog

0

I am trying to create a Delta Table from spark sql using the Glue meta catalog. I can correctly query a Delta table using the Glue metastore:

%%sql
select * from `my_table` VERSION AS OF 1 limit 2 

The issues start when I try to create a new Delta Table in the same catalog.

%%sql
CREATE TABLE `my_table_v1` USING DELTA 
AS (select * from `my_table` VERSION AS OF 1 limit 2 )

##
# IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: s3://my_s3_bucketmy_table_v1

Considering that the Glue database has location s3://my_s3_bucket looks like there is some issue in calculating the full path where the table files must be written.

I then tried to specify the LOCATION of the table creating an EXTERNAL table:

%%sql
CREATE TABLE `my_table_v1` USING DELTA 
LOCATION 's3://my_s3_bucket/my_table_v1'
AS (select * from `my_table` VERSION AS OF 1 limit 2 )

##
# IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: s3://my_s3_bucketmy_table_v1-__PLACEHOLDER__

In this way the data files are correctly generated in s3://my_s3_bucket/my_table_v1 but it then fails registering the table in the Glue database.

Any Suggestion?

  • I am able to use Spark SQL (CREATE OR REPLACE TABLE USING DELTA AS SELECT * FROM TempVIEW)

    Setting database location to a appropriate s3 path will resolve issue. In glue catalog go to database and edit the database to have s3 path location. Use glue version 4.0 or latest.

    We don't need to put location in spark code.

profile picture
已提问 9 个月前1393 查看次数
2 回答
1

That is not supported, even if you solve the path issues, that is not a valid Delta/symlink table, it will create a generic Hive table with an array field.
The OSS Delta community is working to rectify that, in the meanwhile you can register Delta tables on the catalog using a Glue crawler or the DeltaTable "generate" API , but not directly via SparkSQL

profile pictureAWS
专家
已回答 9 个月前
profile picture
专家
已审核 14 天前
0

Thanks for confirming it. If it is not supported the first example in this doc page https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-delta-lake.html should be corrected as it produces the exact same issue. It basically attempts to do the same thing using pyspark syntax.

Example: Write a Delta Lake table to Amazon S3 and register it to the AWS Glue Data Catalog The following AWS Glue ETL script demonstrates how to write a Delta Lake table to Amazon S3 and register the table to the AWS Glue Data Catalog.*

profile picture
已回答 9 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则