Writing Data Directly from EMR into DynamoDB

0

A customer has explored writing data directly from EMR into DynamoDB. It works fine with one issue. In order to run the DynamoDBStorageHandler, they have to change the hive engine to use MR instead of TEZ. While we can do this it does mean that we'd need to run two EMR clusters. Can we confirm that their understanding is correct, they think it is as when they tested against TEZ they saw failures but did not see failures when using MR. And can we find out if there's a better way to do this without requiring 2 EMR clusters.

profile pictureAWS
专家
已提问 4 年前1429 查看次数
1 回答
0
已接受的回答

The emr-dynamodb-connector 's DynamoDBStorageHandler is used by EMR when Hive/Spark/MR is used to interact with DynamoDB tables. It determines a property maxParallelTasks based on a dynamodb.throughput.write.percent config among other parameters. The maxParallelTasks property is a function of MR engine (since connector was implemented when Hive on MR was default), hence if Tez/Spark engines are chosen for Hive, the writes and functioning of the connector doesn't change, however the Iops can no longer be controlled/estimated and the write rate majorly depends on the input size and EMR cluster capacity. This could lead to throttling on the table (not sure if those are the errors you refer to).

As long as Hive on MR is available, current suggestion to be able to control throughput while writing would be to use the MR engine and set it at runtime ( add "set hive.execution.engine=mr;" before anything in the hive script that interacts with DynamoDB). This would not require 2 EMR clusters until at a point the customer starts using a higher release of EMR (all releases of EMR today have it, so if these were continued to be used, no issues) that doesn't ship Hive with MR.

PS: Even with Hive on MR interacting with DynamoDB, EMR cluster capacity could be a secondary factor that determines how quickly the writes can be made, but the max will always be respected as per the dynamodb.throughput.write.percent config set.

profile pictureAWS
已回答 3 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则