How to update parque format_version locally?

0

So, we are having an issue with one table in our ETL process stacking parque files.

The question is, at DMS we can set parquet format_version 1_0 or 2_0, How this change can impact production and there is a form that I can update the version locally? I've already downloaded the files but can't find a way to update the version

Thanks.

Marcelo
已提問 2 個月前檢視次數 97 次
1 個回答
1

you can try using pyarrow which could change the version

import pyarrow as pa
import pyarrow.parquet as pq

# Read Parquet file (version 2.0)
table = pq.read_table('input.parquet')

# Write Parquet file (version 1.0)
pq.write_table(table, 'output.parquet', version='1.0')

AWS
已回答 2 個月前
profile picture
專家
已審閱 2 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南