AWS Glue Crawler errors out trying to find partitionValues.latest in a Delta Lake

0

I'm seeing errors like the following when trying to get a crawler to crawl a non-native Delta Lake S3 folder i have:

WARN : Cannot get schema or partition columns or partition values for Delta table: BUCKET/PATH, got exception: com.amazonaws.services.glue.exceptions.S3NoSuchKeyException: No object found for bucket: glue-dataplane-prod-us-east-1-state-tree-v2 key: d0d989b0-e5e5-4233-a4a1-286ecdee15b2/file_schemas/BUCKET/PATH/partitionValues.latest

And it's correct - there's no partitionValues.latest file in the delta lake folder. But I don't know what that file is, and I've never seen it before in my delta lakes. I also don't know what the uuid/file_schemas bit is about.

I have other delta lakes that work fine without this file, using an identical (afaict) crawler setup. Even this crawler I can kind of sometimes get to work. It worked once on a delta lake without this file, but will give the error for that same delta lake if I have the crawler crawl another delta lake and the one that worked -- all of a sudden neither delta lake past muster.

Have other folks seen this error? Is it used in a certain version of the delta lake spec I'm not using?

(I'm creating these lakes using the delta-spark python package, version 2.3.0 -- the latest at the time of writing.)

Thanks for any tips.

mikix
feita há um ano93 visualizações
Sem respostas

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas