HIVE_UNKNOWN_ERROR: Duplicate key string

1

I am trying to execute in Athena a simple query on a hive table stored in s3.

SELECT * FROM "database"."table" limit 10;

but i get the following error. HIVE_UNKNOWN_ERROR: Duplicate key string

These are the characteristics of the table:

Classification: parquet

Input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

Serde serialization lib: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

**EXTERNAL: **true

has_encrypted_data: false

compressionType: none

typeOfData: file

Any ideas ???

已提問 2 年前檢視次數 2127 次
2 個答案
0

Hello,

From the error message it looks like your Parquet data has duplicate keys. Usually, if it's JSON data and if the duplicates are caused by naming convention -- some fields are in uppercase and some are in lowercase, we then have SerDe property [case.insensitive'='false'] for 'org.openx.data.jsonserde.JsonSerDe'. More details in the link 1.

I have tested similar data mentioned in the article by converting to Parquet and did not face any issue. Please open a support ticket with Athena team and provide your query ID, table DDL and sample data if possible for us to troubleshoot.

REFERENCES:

  1. https://aws.amazon.com/premiumsupport/knowledge-center/json-duplicate-key-error-athena-config/
AWS
支援工程師
已回答 2 年前
0

Maybe it's a bug fixed 2 years ago and not backported by aws: https://github.com/trinodb/trino/issues/6200

已回答 1 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南