AWS DataBrew - DeltaLake source

2

Hello Team,

how can we read Delta Lake (delta parquet) source with DataBrew? Currently, I am getting an error when trying to run a job:

An error occurred while calling o104.getDynamicFrame. s3://[REDACTED]/manifest is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]

The datatype of the dataset is Data Catalog table. With Athena working fine, can fetch the data.

asked a year ago355 views
1 Answer
1

Hello,

Glue Catalog table based on Delta Lake is not supported by Glue DataBrew at this moment. As a workaround, you can use Glue ETL jobs to read and write Delta Lake tables. For more details and examples, refer to: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-delta-lake.html

I hope this information helps.

AWS
Ethan_H
answered a year ago
  • I would like to use DataBrew services like visual data profiling, quality assurance, etc... Then I am looking for another solution. Thanks.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions