AWS Glue Crawler errors out trying to find partitionValues.latest in a Delta Lake

0

I'm seeing errors like the following when trying to get a crawler to crawl a non-native Delta Lake S3 folder i have:

WARN : Cannot get schema or partition columns or partition values for Delta table: BUCKET/PATH, got exception: com.amazonaws.services.glue.exceptions.S3NoSuchKeyException: No object found for bucket: glue-dataplane-prod-us-east-1-state-tree-v2 key: d0d989b0-e5e5-4233-a4a1-286ecdee15b2/file_schemas/BUCKET/PATH/partitionValues.latest

And it's correct - there's no partitionValues.latest file in the delta lake folder. But I don't know what that file is, and I've never seen it before in my delta lakes. I also don't know what the uuid/file_schemas bit is about.

I have other delta lakes that work fine without this file, using an identical (afaict) crawler setup. Even this crawler I can kind of sometimes get to work. It worked once on a delta lake without this file, but will give the error for that same delta lake if I have the crawler crawl another delta lake and the one that worked -- all of a sudden neither delta lake past muster.

Have other folks seen this error? Is it used in a certain version of the delta lake spec I'm not using?

(I'm creating these lakes using the delta-spark python package, version 2.3.0 -- the latest at the time of writing.)

Thanks for any tips.

mikix
posta un anno fa93 visualizzazioni
Nessuna risposta

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande