Issue with Apache Hudi and MoR snapshot tables
I just want to re-raise an issue we've been seeing for a long time, preventing us from querying Hudi MoR realtime tables using AWS Athena.
AWS Athena snapshot queries fails if there are two or more record array fields in a Hudi MoR table:
HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split s3://prod-signal-data-platform-spark-experiments/temporary/articles_hudi_merge_on_read_dev/story_published_partition_date=2022-04-29/75f5af94-8369-4b04-b850-421826ef48cc-0_24-38-3759_20220506091903372.parquet (offset=33554432, length=33554432) using org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat: Can't redefine: array
This query ran against the "articles" database, unless qualified by the query. Please post the error message on our forum
or contact customer support
with Query Id: 7a64950b-624d-4ea3-91d0-33be7a5e74b5
This corresponds to this Apache Hudi issue: https://github.com/apache/hudi/issues/3834 (fixed in Hudi 0.11.0) and requires an AWS Athena change.
I'd be great to see an AWS fix for this issue to allow us to fully leverage Hudi on AWS.
Thank you for bringing this up. Could you please raise a support ticket with Athena Premium Support so that we can help you create a feature request for this issue.
Using Athena to query AWS Lake Formation databaseasked 22 days ago
How to create Athena View using CDKAccepted Answerasked 4 months ago
S3 Hudi Replication and Failoverasked 2 months ago
Hudi and S3 object versionsasked 3 months ago
In-place query of S3 data without provisioning DB or creating tablesasked 3 months ago
Is it possible to specify DB snapshot in AWS Lake Formation?Accepted Answerasked a year ago
Does Amazon Athena support querying Apache Hudi datasets?Accepted AnswerEXPERTasked 2 years ago
Issue with Apache Hudi and MoR snapshot tablesasked 12 days ago
S3 Select vs AthenaAccepted Answerasked 2 years ago
Lightsail instance snapshot stuck in pendingasked 7 months ago