HIVE_UNKNOWN_ERROR: Duplicate key string

1

I am trying to execute in Athena a simple query on a hive table stored in s3.

SELECT * FROM "database"."table" limit 10;

but i get the following error. HIVE_UNKNOWN_ERROR: Duplicate key string

These are the characteristics of the table:

Classification: parquet

Input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

Serde serialization lib: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

**EXTERNAL: **true

has_encrypted_data: false

compressionType: none

typeOfData: file

Any ideas ???

preguntada hace 2 años2128 visualizaciones
2 Respuestas
0

Hello,

From the error message it looks like your Parquet data has duplicate keys. Usually, if it's JSON data and if the duplicates are caused by naming convention -- some fields are in uppercase and some are in lowercase, we then have SerDe property [case.insensitive'='false'] for 'org.openx.data.jsonserde.JsonSerDe'. More details in the link 1.

I have tested similar data mentioned in the article by converting to Parquet and did not face any issue. Please open a support ticket with Athena team and provide your query ID, table DDL and sample data if possible for us to troubleshoot.

REFERENCES:

  1. https://aws.amazon.com/premiumsupport/knowledge-center/json-duplicate-key-error-athena-config/
AWS
INGENIERO DE SOPORTE
respondido hace 2 años
0

Maybe it's a bug fixed 2 years ago and not backported by aws: https://github.com/trinodb/trino/issues/6200

respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas