Issue with Apache Hudi and MoR snapshot tables
I just want to re-raise an issue we've been seeing for a long time, preventing us from querying Hudi MoR realtime tables using AWS Athena.
AWS Athena snapshot queries fails if there are two or more record array fields in a Hudi MoR table:
HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://prod-signal-data-platform-spark-experiments/temporary/articles_hudi_merge_on_read_dev/story_published_partition_date=2022-04-29/75f5af94-8369-4b04-b850-421826ef48cc-0_24-38-3759_20220506091903372.parquet (offset=33554432, length=33554432) using org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat: Can't redefine: array
This query ran against the "articles" database, unless qualified by the query. Please post the error message on our forum
or contact customer support
with Query Id: 7a64950b-624d-4ea3-91d0-33be7a5e74b5
This corresponds to this Apache Hudi issue: https://github.com/apache/hudi/issues/3834 (fixed in Hudi 0.11.0) and requires an AWS Athena change.
I'd be great to see an AWS fix for this issue to allow us to fully leverage Hudi on AWS.
Thank you for bringing this up. Could you please raise a support ticket with Athena Premium Support so that we can help you create a feature request for this issue.
Relevant questions
Using Athena to query AWS Lake Formation database
asked 22 days agoHow to create Athena View using CDK
Accepted Answerasked 4 months agoS3 Hudi Replication and Failover
asked 2 months agoHudi and S3 object versions
asked 3 months agoIn-place query of S3 data without provisioning DB or creating tables
asked 3 months agoIs it possible to specify DB snapshot in AWS Lake Formation?
Accepted Answerasked a year agoDoes Amazon Athena support querying Apache Hudi datasets?
Accepted AnswerIssue with Apache Hudi and MoR snapshot tables
asked 12 days agoS3 Select vs Athena
Accepted Answerasked 2 years agoLightsail instance snapshot stuck in pending
asked 7 months ago