Can we create native delta lake tables only through crawler

0

hello,

AWS recently announced that Glue crawlers can now create native delta lake tables (last December, https://aws.amazon.com/blogs/big-data/introducing-native-delta-lake-table-support-with-aws-glue-crawlers/). We tested and it works fine. However, we would like to not use crawlers. Is this the only way to create a native delta lake table for now? Is this planned to allow this through the Glue console table creation screen?

As a side note, it looks like terraform is still missing a "CreateNativeDeltaTable" option in their latest provider (they have an open issue for that).

Thanks.

Cheers,

Fabrice

질문됨 일 년 전596회 조회
2개 답변
0

Thanks Gonzalo for your answer. Actually, my question is really about "delta" tables. I am able to create tables through the console, or the CLI. I am doing it multiple times :)

Through the console, I cannot select a "delta" format. I can select Avro, CSV, Parquet amongst others, but not Delta. When I create a replicate of the table created by the Delta source crawler, I get errors when querying the table (through Athena, or any other Glue job). When querying the mimicked table I get the error:

HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://.../part-00000-c3e64f54-ed8a-459c-8157-4235f25595b4.c000.snappy.parquet (offset=0, length=182912) using org.apache.hadoop.mapred.SequenceFileInputFormat: s3://com.diabeloop.dev.dta.lake/technical_logs/v3/curated_logs/resampled_information/environment=clinical/year=2022/part-00000-c3e64f54-ed8a-459c-8157-4235f25595b4.c000.snappy.parquet not a SequenceFile

Although querying the delta table created by the crawler (classification indeed equals to "delta") is working like a charm (both Athena and glue job).

답변함 일 년 전
  • I see your point, you could use the wizard but then have to update the table using "ALTER TABLE" until it looks like the one from the crawler. You can do it Athena for instance, copying the DDL from another table

0

There is nothing stopping you creating the table yourself and doing the same the crawler does, as long as you enter the right parameters and configuration.
You can do it via the console, AWS CLI or boto3 or Athena.
It's easier if you get the table definition the crawler created and use it as a template, either using "aws glue get-table" or asking Athena to provide the DDL for an existing table.

profile pictureAWS
전문가
답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인