How to co-exist both Apache Hudi with Apache Iceberg environments

0

If my environment is full of Apache Hudi integrating with EMR and Lake Formation, I found out that Hudi environment is not very friendly to be used by Redshift nor Athena. There are many advanced features on Redshift and Athena that are available only on Apache Iceberg. How can I make Hudi data available in the Apache Iceberg environment such that I can use Redshift and Athena services and Lake Formation in the way of Iceberg ?

Ray Lai
질문됨 5달 전423회 조회
1개 답변
3

Hello,

As you might aware that both file format's metadata and data structure are different, choosing the right one is certainly depends on the individual use case. If you want to bring the Hudi dataset to Iceberg compatible, you can use Spark to read the Hudi table into dataframe and write it as Iceberg formatted files to S3 or Without copying data, you can use CTAS to convert the file format. Please also note that, post conversion the history of versions will be lost. A couple of external references for your review below,

I see this doc for delta to iceberg conversion which might applies to Hudi as well.

External solution that available for this conversion (no personal experience)

In addition, you can use Trino in EMR to read the Hudi tables which is more performance oriented.

AWS
지원 엔지니어
답변함 5달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠