HIVE_UNKNOWN_ERROR: Duplicate key string

1

I am trying to execute in Athena a simple query on a hive table stored in s3.

SELECT * FROM "database"."table" limit 10;

but i get the following error. HIVE_UNKNOWN_ERROR: Duplicate key string

These are the characteristics of the table:

Classification: parquet

Input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

Serde serialization lib: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

**EXTERNAL: **true

has_encrypted_data: false

compressionType: none

typeOfData: file

Any ideas ???

gefragt vor 2 Jahren2127 Aufrufe
2 Antworten
0

Hello,

From the error message it looks like your Parquet data has duplicate keys. Usually, if it's JSON data and if the duplicates are caused by naming convention -- some fields are in uppercase and some are in lowercase, we then have SerDe property [case.insensitive'='false'] for 'org.openx.data.jsonserde.JsonSerDe'. More details in the link 1.

I have tested similar data mentioned in the article by converting to Parquet and did not face any issue. Please open a support ticket with Athena team and provide your query ID, table DDL and sample data if possible for us to troubleshoot.

REFERENCES:

  1. https://aws.amazon.com/premiumsupport/knowledge-center/json-duplicate-key-error-athena-config/
AWS
SUPPORT-TECHNIKER
beantwortet vor 2 Jahren
0

Maybe it's a bug fixed 2 years ago and not backported by aws: https://github.com/trinodb/trino/issues/6200

beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen