HIVE_UNKNOWN_ERROR: Duplicate key string
I am trying to execute in Athena a simple query on a hive table stored in s3.
SELECT FROM "database"."table" limit 10;*
but i get the following error. HIVE_UNKNOWN_ERROR: Duplicate key string
These are the characteristics of the table:
Input format: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
Serde serialization lib: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
Any ideas ???
From the error message it looks like your Parquet data has duplicate keys. Usually, if it's JSON data and if the duplicates are caused by naming convention -- some fields are in uppercase and some are in lowercase, we then have SerDe property case.insensitive'='false' for 'org.openx.data.jsonserde.JsonSerDe'. More details in the link 1.
I have tested similar data mentioned in the article by converting to Parquet and did not face any issue. Please open a support ticket with Athena team and provide your query ID, table DDL and sample data if possible for us to troubleshoot.
Spectrum - Invalid S3 arn providedasked a year ago
loading geospatial data into tableasked 4 months ago
What permissions configurations are required on an S3 bucket for Athena to be able to Preview View on an object?asked 2 months ago
Retrieving Athena Query History dataasked 3 years ago
Can i execute an Athena saved query from lambda?Accepted AnswerMODERATORasked 3 years ago
HIVE_METASTORE_ERROR when running an Athena query to select the first 10 rows from a partitioned table created by a Glue Crawler.asked 2 months ago
HIVE_UNKNOWN_ERROR: Duplicate key stringasked 3 months ago
Athena - Execute multiple query and capture output in a file on e2asked 2 months ago
Can we add column to an existing table in AWS Athena using SQL query?Accepted Answerasked 3 years ago
I cannot use current_date + interval in Athena boto3 query in LambdaAccepted Answerasked 3 months ago