HIVE_UNKNOWN_ERROR: Duplicate key string

1

I am trying to execute in Athena a simple query on a hive table stored in s3.

SELECT * FROM "database"."table" limit 10;

but i get the following error. HIVE_UNKNOWN_ERROR: Duplicate key string

These are the characteristics of the table:

Classification: parquet

Input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

Serde serialization lib: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

**EXTERNAL: **true

has_encrypted_data: false

compressionType: none

typeOfData: file

Any ideas ???

已提问 2 年前2127 查看次数
2 回答
0

Hello,

From the error message it looks like your Parquet data has duplicate keys. Usually, if it's JSON data and if the duplicates are caused by naming convention -- some fields are in uppercase and some are in lowercase, we then have SerDe property [case.insensitive'='false'] for 'org.openx.data.jsonserde.JsonSerDe'. More details in the link 1.

I have tested similar data mentioned in the article by converting to Parquet and did not face any issue. Please open a support ticket with Athena team and provide your query ID, table DDL and sample data if possible for us to troubleshoot.

REFERENCES:

  1. https://aws.amazon.com/premiumsupport/knowledge-center/json-duplicate-key-error-athena-config/
AWS
支持工程师
已回答 2 年前
0

Maybe it's a bug fixed 2 years ago and not backported by aws: https://github.com/trinodb/trino/issues/6200

已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则