Are Glue jobs DynamoDB table read throughput capped at 400 to 500?

0

I am trying to run a pyspark ETL job in Glue which dumps the contents of a DynamoDB table to S3. The table I am trying this with has around 300k items in it and is around 21 GB in size. Without the job running the table has around 20 to 75 read capacity consumed per second on a continuous basis according to the table metrics. I set the read capacity of the table to 1800 and tried different throughput percents of 85%, 95%, and 150% using the following method:

datasource0 = glueContext.create_dynamic_frame_from_options(connection_type="dynamodb", connection_options={"dynamodb.input.tableName": dynamo_table_name, "dynamodb.throughput.read.percent": "0.95"}, transformation_ctx = "datasource0")

When I started the job from the AWS web console under the job configuration settings tab I tried a maximum capacity of 10, 20, and the default 50. No matter which setting I tried the metrics on the DynamoDB table never showed total consumed read capacity going above 500 with no throttling. And the run with 50 DPU and "dynamodb.throughput.read.percent": "1.5" failed with a Java OOM error after 2 hours 4 minutes.

Ultimately I want to do the same type of thing with even larger tables but now I don't know if it is possible here. Is there a way to get higher read throughput from the Dynamo tables? Also what can I do to fix the Java OOM error?

asked 5 years ago513 views
1 Answer
0

To answer my own question for anyone with the same problem. The solution was to increase the number of splits using the dynamodb.splits connection option like so:

datasource0 = glueContext.create_dynamic_frame_from_options(connection_type="dynamodb", connection_options={"dynamodb.input.tableName": dynamo_table_name, "dynamodb.throughput.read.percent": "0.95", "dynamodb.splits": "100"}, transformation_ctx = "datasource0")
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions