Glue - Predicate pushdown with Dynamodb

0

Hello,

   dynamo_df = glueContext.create_dynamic_frame.from_options(
    connection_type="dynamodb",
    connection_options={"dynamodb.input.tableName": lkup_table,
        "dynamodb.throughput.read.percent": "1.0",
        "dynamodb.splits": "100"
    }
    )  

It seems Glue is loading entire dynamodb table (lkup_table). If I add filter, like dynamo_df .filter(col('case')=='1234') - 1.Spark first loads entire table into df 2.Then it filterout the records which isn't efficient way. Is there anyway to add predicate pushdown that avoids complete table load into dataframe (dynamo_df)? Pl. suggest

asked a year ago321 views
2 Answers
2

Unfortunately DynamoDB does not support predicate push down syntax, as its a NoSQL database and to apply the filter the entire table would need to be read regardless.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html

If this is a one-time read then you can consider the export to S3 capability but if you intend on reading continuously you may just want to read the table directly to get more up-to-date data.

profile pictureAWS
EXPERT
answered a year ago
1
Accepted Answer

There is for s3 tables but unfortunately not for DynamoDB.
What you can do is minimize the performance hit (and cost) by using the new s3 export for DynamoDB.
Check this blog: https://aws.amazon.com/blogs/big-data/accelerate-amazon-dynamodb-data-access-in-aws-glue-jobs-using-the-new-aws-glue-dynamodb-elt-connector/

profile pictureAWS
EXPERT
answered a year ago
profile pictureAWS
EXPERT
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions