When I use Apache Hive to import Amazon DynamoDB tables into Amazon EMR, I receive the "Provided key element doesn’t match the schema" error.
Resolution
When you have an incorrect schema, corrupt data, or mismatched data, you might receive the following error:
"The provided key element does not match the schema (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: #ABC###########)"
If incorrect schema, corrupt data, or mismatched data didn't cause the error, then check the Hive application logs. To find the Hive application logs, connect to the primary node of the Amazon EMR cluster and navigate to the /mnt/var/log/hive directory.
If you turned on logging, then you can find the logs in Amazon Simple Storage Service (Amazon S3). Use a path that's similar to s3://log-location/cluster-id/node/primary-instance-id/applications/hive.
Example logs:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"countryasin":"LOCATION '${INPUT}';","hts_type":null,"hts_code":null}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
... 17 more
Caused by: java.lang.RuntimeException: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: The provided key element does not match the schema (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: 0FF3KB36M2SJD8E79BUPOUP943VV4KQNSO5AEMVJF66Q9ASUAAJG)
The row that's in the error message, {"countryasin":"LOCATION '${INPUT}';","hts_type":null,"hts_code":null}, is part of the Hive script. This Hive script is in the same Amazon S3 location as the input files. The import job sends the Hive script to the DynamoDB table as data and uses it in the import job. To resolve this issue, move the Hive script to a different Amazon S3 location.
Related information
View Amazon EMR log files