Querying LZ4 compressed file on s3 using AWS Athena

0

Hi I have created an external table on AWS Glue catalog db .

The table points to a lz4 compressed file on an s3.

the table definition looks like this

CREATE EXTERNAL TABLE `myapplogs`(
  `timestamp` string COMMENT 'from deserializer', 
  `num` string COMMENT 'from deserializer', 
  `num2` string COMMENT 'from deserializer', 
  `num3` string COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'com.amazonaws.glue.serde.GrokSerDe' 
WITH SERDEPROPERTIES ( 
  'input.format'='%{TIMESTAMP_ISO8601:timestamp} %{INT:num} %{INT:num2} %{INT:num3}') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://tesbucket/original/'
TBLPROPERTIES (
  
  'grokPattern'='%{TIMESTAMP_ISO8601:timestamp} %{INT:num} %{INT:num2} %{INT:num3}', 
  'typeOfData'='file')

The table gets created succesfully but the select queries are not returning any data

Pradeep
feita há um mês289 visualizações
1 Resposta
0

The issue could be with how the file is compressed as 'lz4'.

There are 2 ways in which lz4 compresses the data:

  1. Block format : Refer - The legacy format is a simple block-based compression format where each block of compressed data is standalone and does not contain any header or framing information. It directly represents the compressed data without any additional metadata.

  2. Framing format : Refer : The standard format, also known as LZ4 framing, introduces a framing mechanism where each block of compressed data is preceded by a small header containing metadata and framing information. This framing header provides additional features such as the ability to include metadata about the compressed data (e.g., the size of the original uncompressed data), optional content checksums, and other parameters.

While the lz4 utility uses framing format which is not support by Athena. Currently Athena does not support Framing format, hence, please review which format is used to compress your file.

AWS
Anu_C
respondido há um mês

Você não está conectado. Fazer login para postar uma resposta.

Uma boa resposta responde claramente à pergunta, dá feedback construtivo e incentiva o crescimento profissional de quem perguntou.

Diretrizes para responder a perguntas