Missing property on hive partition can cause Athena query to fail?

0

Basically i have a s3 bucket that i use to store some events, and then i use glue crawlers and athena to query the data. I was trying to use hive partitions like this: year=yyyy/month=MM/day=dd and use a compatible schema for all partitions (i saw this on the options when creating a crawler). I assume this is the intended way to use it, since querying 5 tables to get the same data because of a missing fields seems unpractical/undoable to me. The crawler creates only one table with the correct schema (i use a mock file with all possible values), but when i do this query:

SELECT * FROM "TableA" WHERE DATE(CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0'))) BETWEEN DATE('2023/08/08') AND DATE('2023/08/08') AND "TableA"."timestamp" BETWEEN 1691500345 AND 1691501245 ORDER BY "timestamp" DESC

I get this error: "HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'cloud_event_management' in table 'default.TableA' is declared as type 'boolean', but partition 'year=2023/month=08/day=03' declared column 'rule_metadata_updated_at' as type 'string'."

i have checked the table columns and they seem ok, the only thing i noticed is that in the "year=2023/month=08/day=03" partition there was not a single file containing the "cloud_event_management" property. Could this be the error?

Is there any cost effective way to remediate this?

If it helps, the s3 files are files that contains multi-line json, something like this:

" {"agent_id":"SomeAgent", "data_id":"mydataa"...} {"agent_id":"SomeAgent2", "data_id":"mydataa"...} "

gefragt vor 9 Monaten351 Aufrufe
1 Antwort
0
Akzeptierte Antwort

Solution to this problem was to tell the crawler to update all partitions when crawling, is a checkbox in the "Set output and scheduling" advanced section, it says "Update all new and existing partitions with metadata from the table". Hope it saves time to someone

beantwortet vor 9 Monaten
profile picture
EXPERTE
überprüft vor 2 Monaten

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen