Missing property on hive partition can cause Athena query to fail?

0

Basically i have a s3 bucket that i use to store some events, and then i use glue crawlers and athena to query the data. I was trying to use hive partitions like this: year=yyyy/month=MM/day=dd and use a compatible schema for all partitions (i saw this on the options when creating a crawler). I assume this is the intended way to use it, since querying 5 tables to get the same data because of a missing fields seems unpractical/undoable to me. The crawler creates only one table with the correct schema (i use a mock file with all possible values), but when i do this query:

SELECT * FROM "TableA" WHERE DATE(CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0'))) BETWEEN DATE('2023/08/08') AND DATE('2023/08/08') AND "TableA"."timestamp" BETWEEN 1691500345 AND 1691501245 ORDER BY "timestamp" DESC

I get this error: "HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'cloud_event_management' in table 'default.TableA' is declared as type 'boolean', but partition 'year=2023/month=08/day=03' declared column 'rule_metadata_updated_at' as type 'string'."

i have checked the table columns and they seem ok, the only thing i noticed is that in the "year=2023/month=08/day=03" partition there was not a single file containing the "cloud_event_management" property. Could this be the error?

Is there any cost effective way to remediate this?

If it helps, the s3 files are files that contains multi-line json, something like this:

" {"agent_id":"SomeAgent", "data_id":"mydataa"...} {"agent_id":"SomeAgent2", "data_id":"mydataa"...} "

asked 9 months ago333 views
1 Answer
0
Accepted Answer

Solution to this problem was to tell the crawler to update all partitions when crawling, is a checkbox in the "Set output and scheduling" advanced section, it says "Update all new and existing partitions with metadata from the table". Hope it saves time to someone

answered 9 months ago
profile picture
EXPERT
reviewed a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions