Schema incorrectly showing Data type of array in Glue Catalog when using Delta Lake table

0

I have a Delta Lake table saved in s3. I am running the following command:

spark.sql(""" CREATE EXTERNAL TABLE db.my_table USING DELTA LOCATION 's3://path/to/delta/table """)

Everything seems to work fine except when I look at the Schema in Glue Catalog it shows 1 field with column name of "col" and data type of "array". It should have two fields first_name and last_name that are both strings.

It populates correctly using a crawler but I have been asked to provide an alternative solution. How can this be done?

temp999
質問済み 4ヶ月前345ビュー
2回答
1
承認された回答

When creating the table using Spark SQL, although the Glue table may not correctly reflect the table schema, however, SQL queries on the table should work fine as the schema is referenced from the metadata present in the table’s S3 location.

If you would like the table schema to be populated in the Glue catalog table, you may consider creating the Delta Lake table using an Athena query. Athena infers the Delta Lake table metadata from the Delta Lake transaction log and synchronizes it with the Glue catalog. Please see the following document on how to create Delta Lake tables using Athena: https://docs.aws.amazon.com/athena/latest/ug/delta-lake-tables.html#delta-lake-tables-getting-started

Please note that there are no charges for DDL queries in Athena.

AWS
サポートエンジニア
回答済み 4ヶ月前
1

It is a known limitation of the library: https://github.com/delta-io/delta/issues/1679
As Davlish points there are alternatives so it shouldn't be a blocker

profile pictureAWS
エキスパート
回答済み 4ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ