Missing property on hive partition can cause Athena query to fail?

0

Basically i have a s3 bucket that i use to store some events, and then i use glue crawlers and athena to query the data. I was trying to use hive partitions like this: year=yyyy/month=MM/day=dd and use a compatible schema for all partitions (i saw this on the options when creating a crawler). I assume this is the intended way to use it, since querying 5 tables to get the same data because of a missing fields seems unpractical/undoable to me. The crawler creates only one table with the correct schema (i use a mock file with all possible values), but when i do this query:

SELECT * FROM "TableA" WHERE DATE(CONCAT(year, '-', LPAD(month, 2, '0'), '-', LPAD(day, 2, '0'))) BETWEEN DATE('2023/08/08') AND DATE('2023/08/08') AND "TableA"."timestamp" BETWEEN 1691500345 AND 1691501245 ORDER BY "timestamp" DESC

I get this error: "HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'cloud_event_management' in table 'default.TableA' is declared as type 'boolean', but partition 'year=2023/month=08/day=03' declared column 'rule_metadata_updated_at' as type 'string'."

i have checked the table columns and they seem ok, the only thing i noticed is that in the "year=2023/month=08/day=03" partition there was not a single file containing the "cloud_event_management" property. Could this be the error?

Is there any cost effective way to remediate this?

If it helps, the s3 files are files that contains multi-line json, something like this:

" {"agent_id":"SomeAgent", "data_id":"mydataa"...} {"agent_id":"SomeAgent2", "data_id":"mydataa"...} "

질문됨 9달 전349회 조회
1개 답변
0
수락된 답변

Solution to this problem was to tell the crawler to update all partitions when crawling, is a checkbox in the "Set output and scheduling" advanced section, it says "Update all new and existing partitions with metadata from the table". Hope it saves time to someone

답변함 9달 전
profile picture
전문가
검토됨 2달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠