1 Respuesta
- Más nuevo
- Más votos
- Más comentarios
0
Besides this being a well documented bug you can work around by ensuring that the ZSTD compressed files are stored in a format that Athena supports natively, such as Parquet or ORC. Athena supports reading ZSTD compressed data in these formats.
When creating the table in Athena, specify the ZSTD compression in the table properties.
Example Athena CREATE TABLE statement for a Parquet table with ZSTD compression:
CREATE TABLE my_table (
col1 INT,
col2 STRING
)
STORED AS PARQUET
LOCATION 's3://my-bucket/my-data/'
TBLPROPERTIES (
'parquet.compression' = 'ZSTD',
'compression_level' = '5'
);
Than you for your comment. But I do understand that Athena supports ZSTD. Moreover, I use it in production. My question is about Multi-frame ZSTD as it is mentioned in the title. Unfortunately, when you have data files with Multi-frame ZSTD this workaround will not help at all.
Contenido relevante
- OFICIAL DE AWSActualizada hace 5 meses
- OFICIAL DE AWSActualizada hace un año
- OFICIAL DE AWSActualizada hace 2 años
It turned out that there were such known bug in Hadoop. Fix Version/s: 3.4.0, 3.2.3, 3.3.2 https://issues.apache.org/jira/browse/HDFS-14099
It is interesting when this fix will be implemented in Athena? And on which level.