AWS Glue Crawler errors out trying to find partitionValues.latest in a Delta Lake

0

I'm seeing errors like the following when trying to get a crawler to crawl a non-native Delta Lake S3 folder i have:

WARN : Cannot get schema or partition columns or partition values for Delta table: BUCKET/PATH, got exception: com.amazonaws.services.glue.exceptions.S3NoSuchKeyException: No object found for bucket: glue-dataplane-prod-us-east-1-state-tree-v2 key: d0d989b0-e5e5-4233-a4a1-286ecdee15b2/file_schemas/BUCKET/PATH/partitionValues.latest

And it's correct - there's no partitionValues.latest file in the delta lake folder. But I don't know what that file is, and I've never seen it before in my delta lakes. I also don't know what the uuid/file_schemas bit is about.

I have other delta lakes that work fine without this file, using an identical (afaict) crawler setup. Even this crawler I can kind of sometimes get to work. It worked once on a delta lake without this file, but will give the error for that same delta lake if I have the crawler crawl another delta lake and the one that worked -- all of a sudden neither delta lake past muster.

Have other folks seen this error? Is it used in a certain version of the delta lake spec I'm not using?

(I'm creating these lakes using the delta-spark python package, version 2.3.0 -- the latest at the time of writing.)

Thanks for any tips.

mikix
asked a year ago89 views
No Answers

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions