Missing property on hive partition can cause Athena query to fail?

0

Basically i have a s3 bucket that i use to store some events, and then i use glue crawlers and athena to query the data. I was trying to use hive partitions like this: year=yyyy/month=MM/day=dd and use a compatible schema for all partitions (i saw this on the options when creating a crawler). I assume this is the intended way to use it, since querying 5 tables to get the same data because of a missing fields seems unpractical/undoable to me. The crawler creates only one table with the correct schema (i use a mock file with all possible values), but when i do this query:

SELECT * FROM "TableA" WHERE DATE(CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0'))) BETWEEN DATE('2023/08/08') AND DATE('2023/08/08') AND "TableA"."timestamp" BETWEEN 1691500345 AND 1691501245 ORDER BY "timestamp" DESC

I get this error: "HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'cloud_event_management' in table 'default.TableA' is declared as type 'boolean', but partition 'year=2023/month=08/day=03' declared column 'rule_metadata_updated_at' as type 'string'."

i have checked the table columns and they seem ok, the only thing i noticed is that in the "year=2023/month=08/day=03" partition there was not a single file containing the "cloud_event_management" property. Could this be the error?

Is there any cost effective way to remediate this?

If it helps, the s3 files are files that contains multi-line json, something like this:

" {"agent_id":"SomeAgent", "data_id":"mydataa"...} {"agent_id":"SomeAgent2", "data_id":"mydataa"...} "

質問済み 9ヶ月前349ビュー
1回答
0
承認された回答

Solution to this problem was to tell the crawler to update all partitions when crawling, is a checkbox in the "Set output and scheduling" advanced section, it says "Update all new and existing partitions with metadata from the table". Hope it saves time to someone

回答済み 9ヶ月前
profile picture
エキスパート
レビュー済み 2ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ