Writing Data Directly from EMR into DynamoDB

0

A customer has explored writing data directly from EMR into DynamoDB. It works fine with one issue. In order to run the DynamoDBStorageHandler, they have to change the hive engine to use MR instead of TEZ. While we can do this it does mean that we'd need to run two EMR clusters. Can we confirm that their understanding is correct, they think it is as when they tested against TEZ they saw failures but did not see failures when using MR. And can we find out if there's a better way to do this without requiring 2 EMR clusters.

profile pictureAWS
專家
已提問 4 年前檢視次數 1427 次
1 個回答
0
已接受的答案

The emr-dynamodb-connector 's DynamoDBStorageHandler is used by EMR when Hive/Spark/MR is used to interact with DynamoDB tables. It determines a property maxParallelTasks based on a dynamodb.throughput.write.percent config among other parameters. The maxParallelTasks property is a function of MR engine (since connector was implemented when Hive on MR was default), hence if Tez/Spark engines are chosen for Hive, the writes and functioning of the connector doesn't change, however the Iops can no longer be controlled/estimated and the write rate majorly depends on the input size and EMR cluster capacity. This could lead to throttling on the table (not sure if those are the errors you refer to).

As long as Hive on MR is available, current suggestion to be able to control throughput while writing would be to use the MR engine and set it at runtime ( add "set hive.execution.engine=mr;" before anything in the hive script that interacts with DynamoDB). This would not require 2 EMR clusters until at a point the customer starts using a higher release of EMR (all releases of EMR today have it, so if these were continued to be used, no issues) that doesn't ship Hive with MR.

PS: Even with Hive on MR interacting with DynamoDB, EMR cluster capacity could be a secondary factor that determines how quickly the writes can be made, but the max will always be respected as per the dynamodb.throughput.write.percent config set.

profile pictureAWS
已回答 3 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南