Skip to content

Glue - Predicate pushdown with Dynamodb

0

Hello,

   dynamo_df = glueContext.create_dynamic_frame.from_options(
    connection_type="dynamodb",
    connection_options={"dynamodb.input.tableName": lkup_table,
        "dynamodb.throughput.read.percent": "1.0",
        "dynamodb.splits": "100"
    }
    )  

It seems Glue is loading entire dynamodb table (lkup_table). If I add filter, like dynamo_df .filter(col('case')=='1234') - 1.Spark first loads entire table into df 2.Then it filterout the records which isn't efficient way. Is there anyway to add predicate pushdown that avoids complete table load into dataframe (dynamo_df)? Pl. suggest

asked 3 years ago912 views
2 Answers
2

Unfortunately DynamoDB does not support predicate push down syntax, as its a NoSQL database and to apply the filter the entire table would need to be read regardless.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html

If this is a one-time read then you can consider the export to S3 capability but if you intend on reading continuously you may just want to read the table directly to get more up-to-date data.

AWS
EXPERT
answered 3 years ago
1
Accepted Answer

There is for s3 tables but unfortunately not for DynamoDB.
What you can do is minimize the performance hit (and cost) by using the new s3 export for DynamoDB.
Check this blog: https://aws.amazon.com/blogs/big-data/accelerate-amazon-dynamodb-data-access-in-aws-glue-jobs-using-the-new-aws-glue-dynamodb-elt-connector/

AWS
EXPERT
answered 3 years ago
AWS
EXPERT
reviewed 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.