Issue with Apache Hudi and MoR snapshot tables

0

I just want to re-raise an issue we've been seeing for a long time, preventing us from querying Hudi MoR realtime tables using AWS Athena.

AWS Athena snapshot queries fails if there are two or more record array fields in a Hudi MoR table:

HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://prod-signal-data-platform-spark-experiments/temporary/articles_hudi_merge_on_read_dev/story_published_partition_date=2022-04-29/75f5af94-8369-4b04-b850-421826ef48cc-0_24-38-3759_20220506091903372.parquet (offset=33554432, length=33554432) using org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat: Can't redefine: array This query ran against the "articles" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 7a64950b-624d-4ea3-91d0-33be7a5e74b5

This corresponds to this Apache Hudi issue: https://github.com/apache/hudi/issues/3834 (fixed in Hudi 0.11.0) and requires an AWS Athena change.

I'd be great to see an AWS fix for this issue to allow us to fully leverage Hudi on AWS.

질문됨 2년 전518회 조회
1개 답변
0

Thank you for bringing this up. Could you please raise a support ticket with Athena Premium Support so that we can help you create a feature request for this issue.

AWS
지원 엔지니어
답변함 2년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠