Glue - Predicate pushdown with Dynamodb

0

Hello,

   dynamo_df = glueContext.create_dynamic_frame.from_options(
    connection_type="dynamodb",
    connection_options={"dynamodb.input.tableName": lkup_table,
        "dynamodb.throughput.read.percent": "1.0",
        "dynamodb.splits": "100"
    }
    )  

It seems Glue is loading entire dynamodb table (lkup_table). If I add filter, like dynamo_df .filter(col('case')=='1234') - 1.Spark first loads entire table into df 2.Then it filterout the records which isn't efficient way. Is there anyway to add predicate pushdown that avoids complete table load into dataframe (dynamo_df)? Pl. suggest

gefragt vor einem Jahr359 Aufrufe
2 Antworten
2

Unfortunately DynamoDB does not support predicate push down syntax, as its a NoSQL database and to apply the filter the entire table would need to be read regardless.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html

If this is a one-time read then you can consider the export to S3 capability but if you intend on reading continuously you may just want to read the table directly to get more up-to-date data.

profile pictureAWS
EXPERTE
beantwortet vor einem Jahr
1
Akzeptierte Antwort

There is for s3 tables but unfortunately not for DynamoDB.
What you can do is minimize the performance hit (and cost) by using the new s3 export for DynamoDB.
Check this blog: https://aws.amazon.com/blogs/big-data/accelerate-amazon-dynamodb-data-access-in-aws-glue-jobs-using-the-new-aws-glue-dynamodb-elt-connector/

profile pictureAWS
EXPERTE
beantwortet vor einem Jahr
profile pictureAWS
EXPERTE
überprüft vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen