Writing Data Directly from EMR into DynamoDB

0

A customer has explored writing data directly from EMR into DynamoDB. It works fine with one issue. In order to run the DynamoDBStorageHandler, they have to change the hive engine to use MR instead of TEZ. While we can do this it does mean that we'd need to run two EMR clusters. Can we confirm that their understanding is correct, they think it is as when they tested against TEZ they saw failures but did not see failures when using MR. And can we find out if there's a better way to do this without requiring 2 EMR clusters.

profile pictureAWS
ESPERTO
posta 4 anni fa1427 visualizzazioni
1 Risposta
0
Risposta accettata

The emr-dynamodb-connector 's DynamoDBStorageHandler is used by EMR when Hive/Spark/MR is used to interact with DynamoDB tables. It determines a property maxParallelTasks based on a dynamodb.throughput.write.percent config among other parameters. The maxParallelTasks property is a function of MR engine (since connector was implemented when Hive on MR was default), hence if Tez/Spark engines are chosen for Hive, the writes and functioning of the connector doesn't change, however the Iops can no longer be controlled/estimated and the write rate majorly depends on the input size and EMR cluster capacity. This could lead to throttling on the table (not sure if those are the errors you refer to).

As long as Hive on MR is available, current suggestion to be able to control throughput while writing would be to use the MR engine and set it at runtime ( add "set hive.execution.engine=mr;" before anything in the hive script that interacts with DynamoDB). This would not require 2 EMR clusters until at a point the customer starts using a higher release of EMR (all releases of EMR today have it, so if these were continued to be used, no issues) that doesn't ship Hive with MR.

PS: Even with Hive on MR interacting with DynamoDB, EMR cluster capacity could be a secondary factor that determines how quickly the writes can be made, but the max will always be respected as per the dynamodb.throughput.write.percent config set.

profile pictureAWS
con risposta 3 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande