By using AWS re:Post, you agree to the Terms of Use
/HIVE_UNKNOWN_ERROR: Duplicate key string/

HIVE_UNKNOWN_ERROR: Duplicate key string

0

I am trying to execute in Athena a simple query on a hive table stored in s3.

SELECT FROM "database"."table" limit 10;*

but i get the following error. HIVE_UNKNOWN_ERROR: Duplicate key string

These are the characteristics of the table:

Classification: parquet

Input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

Serde serialization lib: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

EXTERNAL: true

has_encrypted_data: false

compressionType: none

typeOfData: file

Any ideas ???

1 Answers
0

Hello,

From the error message it looks like your Parquet data has duplicate keys. Usually, if it's JSON data and if the duplicates are caused by naming convention -- some fields are in uppercase and some are in lowercase, we then have SerDe property case.insensitive'='false' for 'org.openx.data.jsonserde.JsonSerDe'. More details in the link 1.

I have tested similar data mentioned in the article by converting to Parquet and did not face any issue. Please open a support ticket with Athena team and provide your query ID, table DDL and sample data if possible for us to troubleshoot.

REFERENCES:

  1. https://aws.amazon.com/premiumsupport/knowledge-center/json-duplicate-key-error-athena-config/
SUPPORT ENGINEER
answered 3 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions