Are Glue jobs DynamoDB table read throughput capped at 400 to 500?

0

I am trying to run a pyspark ETL job in Glue which dumps the contents of a DynamoDB table to S3. The table I am trying this with has around 300k items in it and is around 21 GB in size. Without the job running the table has around 20 to 75 read capacity consumed per second on a continuous basis according to the table metrics. I set the read capacity of the table to 1800 and tried different throughput percents of 85%, 95%, and 150% using the following method:

datasource0 = glueContext.create_dynamic_frame_from_options(connection_type="dynamodb", connection_options={"dynamodb.input.tableName": dynamo_table_name, "dynamodb.throughput.read.percent": "0.95"}, transformation_ctx = "datasource0")

When I started the job from the AWS web console under the job configuration settings tab I tried a maximum capacity of 10, 20, and the default 50. No matter which setting I tried the metrics on the DynamoDB table never showed total consumed read capacity going above 500 with no throttling. And the run with 50 DPU and "dynamodb.throughput.read.percent": "1.5" failed with a Java OOM error after 2 hours 4 minutes.

Ultimately I want to do the same type of thing with even larger tables but now I don't know if it is possible here. Is there a way to get higher read throughput from the Dynamo tables? Also what can I do to fix the Java OOM error?

posta 5 anni fa547 visualizzazioni
1 Risposta
0

To answer my own question for anyone with the same problem. The solution was to increase the number of splits using the dynamodb.splits connection option like so:

datasource0 = glueContext.create_dynamic_frame_from_options(connection_type="dynamodb", connection_options={"dynamodb.input.tableName": dynamo_table_name, "dynamodb.throughput.read.percent": "0.95", "dynamodb.splits": "100"}, transformation_ctx = "datasource0")
con risposta 5 anni fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande